summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2017-06-09ufs: set correct ->s_maxsizeAl Viro
Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-06-09ufs: restore maintaining ->i_blocksAl Viro
Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-06-09fix ufs_isblockset()Al Viro
Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-06-09ufs: restore proper tail allocationAl Viro
Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-06-09security/selinux: allow security_sb_clone_mnt_opts to enable/disable native ↵Scott Mayhew
labeling behavior When an NFSv4 client performs a mount operation, it first mounts the NFSv4 root and then does path walk to the exported path and performs a submount on that, cloning the security mount options from the root's superblock to the submount's superblock in the process. Unless the NFS server has an explicit fsid=0 export with the "security_label" option, the NFSv4 root superblock will not have SBLABEL_MNT set, and neither will the submount superblock after cloning the security mount options. As a result, setxattr's of security labels over NFSv4.2 will fail. In a similar fashion, NFSv4.2 mounts mounted with the context= mount option will not show the correct labels because the nfs_server->caps flags of the cloned superblock will still have NFS_CAP_SECURITY_LABEL set. Allowing the NFSv4 client to enable or disable SECURITY_LSM_NATIVE_LABELS behavior will ensure that the SBLABEL_MNT flag has the correct value when the client traverses from an exported path without the "security_label" option to one with the "security_label" option and vice versa. Similarly, checking to see if SECURITY_LSM_NATIVE_LABELS is set upon return from security_sb_clone_mnt_opts() and clearing NFS_CAP_SECURITY_LABEL if necessary will allow the correct labels to be displayed for NFSv4.2 mounts mounted with the context= mount option. Resolves: https://github.com/SELinuxProject/selinux-kernel/issues/35 Signed-off-by: Scott Mayhew <smayhew@redhat.com> Reviewed-by: Stephen Smalley <sds@tycho.nsa.gov> Tested-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-06-09Btrfs: fix delalloc accounting leak caused by u32 overflowOmar Sandoval
btrfs_calc_trans_metadata_size() does an unsigned 32-bit multiplication, which can overflow if num_items >= 4 GB / (nodesize * BTRFS_MAX_LEVEL * 2). For a nodesize of 16kB, this overflow happens at 16k items. Usually, num_items is a small constant passed to btrfs_start_transaction(), but we also use btrfs_calc_trans_metadata_size() for metadata reservations for extent items in btrfs_delalloc_{reserve,release}_metadata(). In drop_outstanding_extents(), num_items is calculated as inode->reserved_extents - inode->outstanding_extents. The difference between these two counters is usually small, but if many delalloc extents are reserved and then the outstanding extents are merged in btrfs_merge_extent_hook(), the difference can become large enough to overflow in btrfs_calc_trans_metadata_size(). The overflow manifests itself as a leak of a multiple of 4 GB in delalloc_block_rsv and the metadata bytes_may_use counter. This in turn can cause early ENOSPC errors. Additionally, these WARN_ONs in extent-tree.c will be hit when unmounting: WARN_ON(fs_info->delalloc_block_rsv.size > 0); WARN_ON(fs_info->delalloc_block_rsv.reserved > 0); WARN_ON(space_info->bytes_pinned > 0 || space_info->bytes_reserved > 0 || space_info->bytes_may_use > 0); Fix it by casting nodesize to a u64 so that btrfs_calc_trans_metadata_size() does a full 64-bit multiplication. While we're here, do the same in btrfs_calc_trunc_metadata_size(); this can't overflow with any existing uses, but it's better to be safe here than have another hard-to-debug problem later on. Cc: stable@vger.kernel.org Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2017-06-09Btrfs: clear EXTENT_DEFRAG bits in finish_ordered_ioLiu Bo
Before this, we use 'filled' mode here, ie. if all range has been filled with EXTENT_DEFRAG bits, get to clear it, but if the defrag range joins the adjacent delalloc range, then we'll have EXTENT_DEFRAG bits in extent_state until releasing this inode's pages, and that prevents extent_data from being freed. This clears the bit if any was found within the ordered extent. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2017-06-09btrfs: tree-log.c: Wrong printk information about namelenSu Yue
In verify_dir_item, it wants to printk name_len of dir_item but printk data_len acutally. Fix it by calling btrfs_dir_name_len instead of btrfs_dir_data_len. Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2017-06-09block: switch bios to blk_status_tChristoph Hellwig
Replace bi_error with a new bi_status to allow for a clear conversion. Note that device mapper overloaded bi_error with a private value, which we'll have to keep arround at least for now and thus propagate to a proper blk_status_t value. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-06-09block_dev: propagate bio_iov_iter_get_pages error in __blkdev_direct_IOChristoph Hellwig
Once we move the block layer to its own status code we'll still want to propagate the bio_iov_iter_get_pages, so restructure __blkdev_direct_IO to take ret into account when returning the errno. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-06-09fs: simplify dio_bio_completeChristoph Hellwig
Only read bio->bi_error once in the common path. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-06-09fs: remove the unused error argument to dio_end_io()Christoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-06-09gfs2: remove the unused sd_log_error fieldChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-06-09tty: add compat_ioctl callbacksAleksa Sarai
In order to avoid future diversions between fs/compat_ioctl.c and drivers/tty/pty.c, define .compat_ioctl callbacks for the relevant tty_operations structs. Since both pty_unix98_ioctl() and pty_bsd_ioctl() are compatible between 32-bit and 64-bit userspace no special translation is required. Signed-off-by: Aleksa Sarai <asarai@suse.de> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-08xfs: fix spurious spin_is_locked() assert failures on non-smp kernelsBrian Foster
The 0-day kernel test robot reports assertion failures on !CONFIG_SMP kernels due to failed spin_is_locked() checks. As it turns out, spin_is_locked() is hardcoded to return zero on !CONFIG_SMP kernels and so this function cannot be relied on to verify spinlock state in this configuration. To avoid this problem, replace the associated asserts with lockdep variants that do the right thing regardless of kernel configuration. Drop the one assert that checks for an unlocked lock as there is no suitable lockdep variant for that case. This moves the spinlock checks from XFS debug code to lockdep, but generally provides the same level of protection. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-06-08crypto: Work around deallocated stack frame reference gcc bug on sparc.David Miller
On sparc, if we have an alloca() like situation, as is the case with SHASH_DESC_ON_STACK(), we can end up referencing deallocated stack memory. The result can be that the value is clobbered if a trap or interrupt arrives at just the right instruction. It only occurs if the function ends returning a value from that alloca() area and that value can be placed into the return value register using a single instruction. For example, in lib/libcrc32c.c:crc32c() we end up with a return sequence like: return %i7+8 lduw [%o5+16], %o0 ! MEM[(u32 *)__shash_desc.1_10 + 16B], %o5 holds the base of the on-stack area allocated for the shash descriptor. But the return released the stack frame and the register window. So if an intererupt arrives between 'return' and 'lduw', then the value read at %o5+16 can be corrupted. Add a data compiler barrier to work around this problem. This is exactly what the gcc fix will end up doing as well, and it absolutely should not change the code generated for other cpus (unless gcc on them has the same bug :-) With crucial insight from Eric Sandeen. Cc: <stable@vger.kernel.org> Reported-by: Anatoly Pugachev <matorola@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-06-07rxrpc: Provide a cmsg to specify the amount of Tx data for a callDavid Howells
Provide a control message that can be specified on the first sendmsg() of a client call or the first sendmsg() of a service response to indicate the total length of the data to be transmitted for that call. Currently, because the length of the payload of an encrypted DATA packet is encrypted in front of the data, the packet cannot be encrypted until we know how much data it will hold. By specifying the length at the beginning of the transmit phase, each DATA packet length can be set before we start loading data from userspace (where several sendmsg() calls may contribute to a particular packet). An error will be returned if too little or too much data is presented in the Tx phase. Signed-off-by: David Howells <dhowells@redhat.com>
2017-06-05NFSv4.2: Don't send mode again in post-EXCLUSIVE4_1 SETATTR with umaskBenjamin Coddington
Now that we have umask support, we shouldn't re-send the mode in a SETATTR following an exclusive CREATE, or we risk having the same problem fixed in commit 5334c5bdac92 ("NFS: Send attributes in OPEN request for NFS4_CREATE_EXCLUSIVE4_1"), which is that files with S_ISGID will have that bit stripped away. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Fixes: dff25ddb4808 ("nfs: add support for the umask attribute") Cc: stable@vger.kernel.org # v4.10+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-06-05overlayfs: use uuid_t instead of uuid_beChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2017-06-05fs: switch ->s_uuid to uuid_tChristoph Hellwig
For some file systems we still memcpy into it, but in various places this already allows us to use the proper uuid helpers. More to come.. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com> (Changes to IMA/EVM) Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2017-06-05xfs: use the common helper uuid_is_null()Amir Goldstein
Use the common helper uuid_is_null() and remove the xfs specific helper uuid_is_nil(). The common helper does not check for the NULL pointer value as xfs helper did, but xfs code never calls the helper with a pointer that can be NULL. Conform comments and warning strings to use the term 'null uuid' instead of 'nil uuid', because this is the terminology used by lib/uuid.c and its users. It is also the terminology used in userspace by libuuid and xfsprogs. Signed-off-by: Amir Goldstein <amir73il@gmail.com> [hch: remove now unused uuid.[ch]] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2017-06-05xfs: remove uuid_getnodeuniq and xfs_uu_tChristoph Hellwig
Opencode uuid_getnodeuniq in the only caller, and directly decode the uuid_t representation instead of using a structure cast for it. Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-06-05uuid: hoist helpers uuid_equal() and uuid_copy() from xfsChristoph Hellwig
These helper are used to compare and copy two uuid_t type objects. Signed-off-by: Amir Goldstein <amir73il@gmail.com> [hch: also provide the respective guid_ versions] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2017-06-05uuid: rename uuid typesChristoph Hellwig
Our "little endian" UUID really is a Wintel GUID, so rename it and its helpers such (guid_t). The big endian UUID is the only true one, so give it the name uuid_t. The uuid_le and uuid_be names are retained for now, but will hopefully go away soon. The exception to that are the _cmp helpers that will be replaced by better primitives ASAP and thus don't get the new names. Also the _to_bin helpers are named to match the better named uuid_parse routine in userspace. Also remove the existing typedef in XFS that's now been superceeded by the generic type name. Signed-off-by: Christoph Hellwig <hch@lst.de> [andy: also update the UUID_LE/UUID_BE macros including fallout] Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-06-05nfsd: namespace-prefix uuid_parseChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-06-05xfs: use uuid_be to implement the uuid_t typeChristoph Hellwig
Use the generic Linux definition to implement our UUID type, this will allow using more generic infrastructure in the future. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-06-05xfs: use uuid_copy() helper to abstract uuid_tAmir Goldstein
uuid_t definition is about to change. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-06-05uuid,afs: move struct uuid_v1 back into afsChristoph Hellwig
This essentially is a partial revert of commit ff548773 ("afs: Move UUID struct to linux/uuid.h") and moves struct uuid_v1 back into fs/afs as struct afs_uuid. It however keeps it as big endian structure so that we can use the normal uuid generation helpers when casting to/from struct afs_uuid. The V1 uuid intrepretation in struct form isn't really useful to the rest of the kernel, and not really compatible to it either, so move it back to AFS instead of polluting the global uuid.h. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: David Howells <dhowells@redhat.com>
2017-06-04fs/ufs: Set UFS default maximum bytes per fileRichard Narron
This fixes a problem with reading files larger than 2GB from a UFS-2 file system: https://bugzilla.kernel.org/show_bug.cgi?id=195721 The incorrect UFS s_maxsize limit became a problem as of commit c2a9737f45e2 ("vfs,mm: fix a dead loop in truncate_inode_pages_range()") which started using s_maxbytes to avoid a page index overflow in do_generic_file_read(). That caused files to be truncated on UFS-2 file systems because the default maximum file size is 2GB (MAX_NON_LFS) and UFS didn't update it. Here I simply increase the default to a common value used by other file systems. Signed-off-by: Richard Narron <comet.berkeley@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Will B <will.brokenbourgh2877@gmail.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: <stable@vger.kernel.org> # v4.9 and backports of c2a9737f45e2 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-04Merge tag 'nfs-for-4.12-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client bugfixes from Trond Myklebust: "Bugfixes include: - Fix a typo in commit e092693443b ("NFS append COMMIT after synchronous COPY") that breaks copy offload - Fix the connect error propagation in xs_tcp_setup_socket() - Fix a lock leak in nfs40_walk_client_list - Verify that pNFS requests lie within the offset range of the layout segment" * tag 'nfs-for-4.12-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: nfs: Mark unnecessarily extern functions as static SUNRPC: ensure correct error is reported by xs_tcp_setup_socket() NFSv4.0: Fix a lock leak in nfs40_walk_client_list pnfs: Fix the check for requests in range of layout segment xprtrdma: Delete an error message for a failed memory allocation in xprt_rdma_bc_setup() pNFS/flexfiles: missing error code in ff_layout_alloc_lseg() NFS fix COMMIT after COPY
2017-06-04compat statfs: switch to copy_to_user()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-06-03nfs: Mark unnecessarily extern functions as staticJan Kara
nfs_initialise_sb() and nfs_clone_super() are declared as extern even though they are used only in fs/nfs/super.c. Mark them as static. Also remove explicit 'inline' directive from nfs_initialise_sb() and leave it upto compiler to decide whether inlining is worth it. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-06-02Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc fixes from Andrew Morton: "15 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: scripts/gdb: make lx-dmesg command work (reliably) mm: consider memblock reservations for deferred memory initialization sizing mm/hugetlb: report -EHWPOISON not -EFAULT when FOLL_HWPOISON is specified mlock: fix mlock count can not decrease in race condition mm/migrate: fix refcount handling when !hugepage_migration_supported() dax: fix race between colliding PMD & PTE entries mm: avoid spurious 'bad pmd' warning messages mm/page_alloc.c: make sure OOM victim can try allocations with no watermarks once pcmcia: remove left-over %Z format slub/memcg: cure the brainless abuse of sysfs attributes initramfs: fix disabling of initramfs (and its compression) mm: clarify why we want kmalloc before falling backto vmallock frv: declare jiffies to be located in the .data section include/linux/gfp.h: fix ___GFP_NOLOCKDEP value ksm: prevent crash after write_protect_page fails
2017-06-02dax: fix race between colliding PMD & PTE entriesRoss Zwisler
We currently have two related PMD vs PTE races in the DAX code. These can both be easily triggered by having two threads reading and writing simultaneously to the same private mapping, with the key being that private mapping reads can be handled with PMDs but private mapping writes are always handled with PTEs so that we can COW. Here is the first race: CPU 0 CPU 1 (private mapping write) __handle_mm_fault() create_huge_pmd() - FALLBACK handle_pte_fault() passes check for pmd_devmap() (private mapping read) __handle_mm_fault() create_huge_pmd() dax_iomap_pmd_fault() inserts PMD dax_iomap_pte_fault() does a PTE fault, but we already have a DAX PMD installed in our page tables at this spot. Here's the second race: CPU 0 CPU 1 (private mapping read) __handle_mm_fault() passes check for pmd_none() create_huge_pmd() dax_iomap_pmd_fault() inserts PMD (private mapping write) __handle_mm_fault() create_huge_pmd() - FALLBACK (private mapping read) __handle_mm_fault() passes check for pmd_none() create_huge_pmd() handle_pte_fault() dax_iomap_pte_fault() inserts PTE dax_iomap_pmd_fault() inserts PMD, but we already have a PTE at this spot. The core of the issue is that while there is isolation between faults to the same range in the DAX fault handlers via our DAX entry locking, there is no isolation between faults in the code in mm/memory.c. This means for instance that this code in __handle_mm_fault() can run: if (pmd_none(*vmf.pmd) && transparent_hugepage_enabled(vma)) { ret = create_huge_pmd(&vmf); But by the time we actually get to run the fault handler called by create_huge_pmd(), the PMD is no longer pmd_none() because a racing PTE fault has installed a normal PMD here as a parent. This is the cause of the 2nd race. The first race is similar - there is the following check in handle_pte_fault(): } else { /* See comment in pte_alloc_one_map() */ if (pmd_devmap(*vmf->pmd) || pmd_trans_unstable(vmf->pmd)) return 0; So if a pmd_devmap() PMD (a DAX PMD) has been installed at vmf->pmd, we will bail and retry the fault. This is correct, but there is nothing preventing the PMD from being installed after this check but before we actually get to the DAX PTE fault handlers. In my testing these races result in the following types of errors: BUG: Bad rss-counter state mm:ffff8800a817d280 idx:1 val:1 BUG: non-zero nr_ptes on freeing mm: 15 Fix this issue by having the DAX fault handlers verify that it is safe to continue their fault after they have taken an entry lock to block other racing faults. [ross.zwisler@linux.intel.com: improve fix for colliding PMD & PTE entries] Link: http://lkml.kernel.org/r/20170526195932.32178-1-ross.zwisler@linux.intel.com Link: http://lkml.kernel.org/r/20170522215749.23516-2-ross.zwisler@linux.intel.com Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reported-by: Pawel Lebioda <pawel.lebioda@intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: "Darrick J. Wong" <darrick.wong@oracle.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Cc: Pawel Lebioda <pawel.lebioda@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Xiong Zhou <xzhou@redhat.com> Cc: Eryu Guan <eguan@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-02Merge tag 'xfs-4.12-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull XFS fix from Darrick Wong: "I've one more bugfix for you for 4.12-rc4: Fix an unmount hang due to a race in io buffer accounting" * tag 'xfs-4.12-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: use ->b_state to fix buffer I/O accounting release race
2017-06-01Merge tag 'nfsd-4.12-1' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd fixes from Bruce Fields: "Revert patch accidentally included in the merge window pull request, and fix a crash that was likely a result of buggy client behavior" * tag 'nfsd-4.12-1' of git://linux-nfs.org/~bfields/linux: nfsd4: fix null dereference on replay nfsd: Revert "nfsd: check for oversized NFSv2/v3 arguments"
2017-06-01Merge tag 'gcc-plugins-v4.12-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull gcc-plugin prepwork from Kees Cook: "Use designated initializers for mtk-vcodec, powerplay, amdgpu, and sgi-xp. Use ERR_CAST() to avoid cross-structure cast in ocf2, ntfs, and NFS. Christoph Hellwig recommended that I send these fixes now, rather than waiting for the v4.13 merge window. These are all initializer and cast fixes needed for the future randstruct plugin that haven't been picked up by the respective maintainers" * tag 'gcc-plugins-v4.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: mtk-vcodec: Use designated initializers drm/amd/powerplay: Use designated initializers drm/amdgpu: Use designated initializers sgi-xp: Use designated initializers ocfs2: Use ERR_CAST() to avoid cross-structure cast ntfs: Use ERR_CAST() to avoid cross-structure cast NFS: Use ERR_CAST() to avoid cross-structure cast
2017-06-01nfsd: Check queue type before submitting a SCSI requestBart Van Assche
Since using scsi_req() is only allowed against request queues for which struct scsi_request is the first member of their private request data, refuse to submit SCSI commands against a queue for which this is not the case. References: commit 82ed4db499b8 ("block: split scsi_request out of struct request") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: J. Bruce Fields <bfields@redhat.com> Cc: Jeff Layton <jlayton@poochiereds.net> Cc: Omar Sandoval <osandov@fb.com> Cc: linux-nfs@vger.kernel.org Signed-off-by: Jens Axboe <axboe@fb.com>
2017-06-01Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull Reiserfs and GFS2 fixes from Jan Kara: "Fixes to GFS2 & Reiserfs for the fallout of the recent WRITE_FUA cleanup from Christoph. Fixes for other filesystems were already merged by respective maintainers." * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: reiserfs: Make flush bios explicitely sync gfs2: Make flush bios explicitely sync
2017-06-01fs/locks: don't mess with the address limit in compat_fcntl64Christoph Hellwig
Instead write a proper compat syscall that calls common helpers. [ jlayton: fix pointer dereferencing in fixup_compat_flock ] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com>
2017-06-01btrfs: fix race with relocation recovery and fs_root setupJeff Mahoney
If we have to recover relocation during mount, we'll ultimately have to evict the orphan inode. That goes through the reservation dance, where priority_reclaim_metadata_space and flush_space expect fs_info->fs_root to be valid. That's the next thing to be set up during mount, so we crash, almost always in flush_space trying to join the transaction but priority_reclaim_metadata_space is possible as well. This call path has been problematic in the past WRT whether ->fs_root is valid yet. Commit 957780eb278 (Btrfs: introduce ticketed enospc infrastructure) added new users that are called in the direct path instead of the async path that had already been worked around. The thing is that we don't actually need the fs_root, specifically, for anything. We either use it to determine whether the root is the chunk_root for use in choosing an allocation profile or as a root to pass btrfs_join_transaction before immediately committing it. Anything that isn't the chunk root works in the former case and any root works in the latter. A simple fix is to use a root we know will always be there: the extent_root. Cc: <stable@vger.kernel.org> # v4.8+ Fixes: 957780eb278 (Btrfs: introduce ticketed enospc infrastructure) Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-06-01btrfs: fix memory leak in update_space_info failure pathJeff Mahoney
If we fail to add the space_info kobject, we'll leak the memory for the percpu counter. Fixes: 6ab0a2029c (btrfs: publish allocation data in sysfs) Cc: <stable@vger.kernel.org> # v3.14+ Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-06-01btrfs: use correct types for page indices in btrfs_page_exists_in_rangeDavid Sterba
Variables start_idx and end_idx are supposed to hold a page index derived from the file offsets. The int type is not the right one though, offsets larger than 1 << 44 will get silently trimmed off the high bits. (1 << 44 is 16TiB) What can go wrong, if start is below the boundary and end gets trimmed: - if there's a page after start, we'll find it (radix_tree_gang_lookup_slot) - the final check "if (page->index <= end_idx)" will unexpectedly fail The function will return false, ie. "there's no page in the range", although there is at least one. btrfs_page_exists_in_range is used to prevent races in: * in hole punching, where we make sure there are not pages in the truncated range, otherwise we'll wait for them to finish and redo truncation, but we're going to replace the pages with holes anyway so the only problem is the intermediate state * lock_extent_direct: we want to make sure there are no pages before we lock and start DIO, to prevent stale data reads For practical occurence of the bug, there are several constaints. The file must be quite large, the affected range must cross the 16TiB boundary and the internal state of the file pages and pending operations must match. Also, we must not have started any ordered data in the range, otherwise we don't even reach the buggy function check. DIO locking tries hard in several places to avoid deadlocks with buffered IO and avoids waiting for ranges. The worst consequence seems to be stale data read. CC: Liu Bo <bo.li.liu@oracle.com> CC: stable@vger.kernel.org # 3.16+ Fixes: fc4adbff823f7 ("btrfs: Drop EXTENT_UPTODATE check in hole punching and direct locking") Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-05-31pstore: Fix format string to use %u for record idKees Cook
The format string for record->id (u64) was using %lld instead of %llu. Signed-off-by: Kees Cook <keescook@chromium.org>
2017-05-31pstore: Populate pstore record->time fieldKees Cook
The current time will be initially available in the record->time field for all pstore_read() and pstore_write() calls. Backends can either update the field during read(), or use the field during write() instead of fetching time themselves. Signed-off-by: Kees Cook <keescook@chromium.org>
2017-05-31pstore: Create common record initializerKees Cook
In preparation for setting timestamps in the pstore core, create a common initializer routine, instead of using static initializers. Signed-off-by: Kees Cook <keescook@chromium.org>
2017-05-31pstore: Avoid potential infinite loopKees Cook
If a backend does not correctly iterate through its records, pstore will get stuck loading entries. Detect this with a large record count, and announce if we ever hit the limit. This will let future backend reading bugs less annoying to debug. Additionally adjust the error about pstore_mkfile() failing. Signed-off-by: Kees Cook <keescook@chromium.org>
2017-05-31pstore: Fix leaked pstore_record in pstore_get_backend_records()Douglas Anderson
When the "if (record->size <= 0)" test is true in pstore_get_backend_records() it's pretty clear that nobody holds a reference to the allocated pstore_record, yet we don't free it. Let's free it. Fixes: 2a2b0acf768c ("pstore: Allocate records on heap instead of stack") Signed-off-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org
2017-05-31pstore: Don't warn if data is uncompressed and type is not PSTORE_TYPE_DMESGAnkit Kumar
commit 9abdcccc3d5f ("pstore: Extract common arguments into structure") moved record decompression to function. decompress_record() gets called without checking type and compressed flag. Warning will be reported if data is uncompressed. Pstore type PSTORE_TYPE_PPC_OPAL, PSTORE_TYPE_PPC_COMMON doesn't contain compressed data and warning get printed part of dmesg. Partial dmesg log: [ 35.848914] pstore: ignored compressed record type 6 [ 35.848927] pstore: ignored compressed record type 8 Above warning should not get printed as it is known that data won't be compressed for above type and it is valid condition. This patch returns if data is not compressed and print warning only if data is compressed and type is not PSTORE_TYPE_DMESG. Reported-by: Anton Blanchard <anton@au1.ibm.com> Signed-off-by: Ankit Kumar <ankit@linux.vnet.ibm.com> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Kees Cook <keescook@chromium.org> Fixes: 9abdcccc3d5f ("pstore: Extract common arguments into structure") Cc: stable@vger.kernel.org
2017-05-31Merge branch 'overlayfs-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull overlayfs fixes from Miklos Szeredi: "Fix regressions: - missing CONFIG_EXPORTFS dependency - failure if upper fs doesn't support xattr - bad error cleanup This also adds the concept of "impure" directories complementing the "origin" marking introduced in -rc1. Together they enable getting consistent st_ino and d_ino for directory listings. And there's a bug fix and a cleanup as well" * 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: filter trusted xattr for non-admin ovl: mark upper merge dir with type origin entries "impure" ovl: mark upper dir with type origin entries "impure" ovl: remove unused arg from ovl_lookup_temp() ovl: handle rename when upper doesn't support xattr ovl: don't fail copy-up if upper doesn't support xattr ovl: check on mount time if upper fs supports setting xattr ovl: fix creds leak in copy up error path ovl: select EXPORTFS