summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2017-09-15Merge tag 'for-linus-4.14-ofs2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux Pull orangefs updates from Mike Marshall: "Some cleanups and a big bug fix for ACLs. When I was reviewing Jan Kara's ACL patch, I realized that Orangefs ACL code was busted, not just in the kernel module, but in the server as well. I've been working on the code in the server mostly, but here's one kernel patch, there will be more" * tag 'for-linus-4.14-ofs2' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux: orangefs: Adjust three checks for null pointers orangefs: Use kcalloc() in orangefs_prepare_cdm_array() orangefs: Delete error messages for a failed memory allocation in five functions orangefs: constify xattr_handler structure orangefs: don't call filemap_write_and_wait from fsync orangefs: off by ones in xattr size checks orangefs: documentation clean up orangefs: react properly to posix_acl_update_mode's aftermath. orangefs: Don't clear SGID when inheriting ACLs
2017-09-14vfs: constify path argument to kernel_read_file_from_pathMimi Zohar
This patch constifies the path argument to kernel_read_file_from_path(). Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-14Merge tag 'nfs-for-4.14-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull more NFS client updates from Trond Myklebust: "Hightlights include: Bugfixes: - Various changes relating to reporting IO errors. - pnfs: Use the standard I/O stateid when calling LAYOUTGET Features: - Add static NFS I/O tracepoints for debugging" * tag 'nfs-for-4.14-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFS: various changes relating to reporting IO errors. NFS: Add static NFS I/O tracepoints pNFS: Use the standard I/O stateid when calling LAYOUTGET
2017-09-14Merge branch 'work.misc' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc leftovers from Al Viro. * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fix the __user misannotations in asm-generic get_user/put_user fput: Don't reinvent the wheel but use existing llist API namespace.c: Don't reinvent the wheel but use existing llist API
2017-09-14Merge branch 'work.read_write' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull nowait read support from Al Viro: "Support IOCB_NOWAIT for buffered reads and block devices" * 'work.read_write' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: block_dev: support RFW_NOWAIT on block device nodes fs: support RWF_NOWAIT for buffered reads fs: support IOCB_NOWAIT in generic_file_buffered_read fs: pass iocb to do_generic_file_read
2017-09-14Merge branch 'work.mount' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull mount flag updates from Al Viro: "Another chunk of fmount preparations from dhowells; only trivial conflicts for that part. It separates MS_... bits (very grotty mount(2) ABI) from the struct super_block ->s_flags (kernel-internal, only a small subset of MS_... stuff). This does *not* convert the filesystems to new constants; only the infrastructure is done here. The next step in that series is where the conflicts would be; that's the conversion of filesystems. It's purely mechanical and it's better done after the merge, so if you could run something like list=$(for i in MS_RDONLY MS_NOSUID MS_NODEV MS_NOEXEC MS_SYNCHRONOUS MS_MANDLOCK MS_DIRSYNC MS_NOATIME MS_NODIRATIME MS_SILENT MS_POSIXACL MS_KERNMOUNT MS_I_VERSION MS_LAZYTIME; do git grep -l $i fs drivers/staging/lustre drivers/mtd ipc mm include/linux; done|sort|uniq|grep -v '^fs/namespace.c$') sed -i -e 's/\<MS_RDONLY\>/SB_RDONLY/g' \ -e 's/\<MS_NOSUID\>/SB_NOSUID/g' \ -e 's/\<MS_NODEV\>/SB_NODEV/g' \ -e 's/\<MS_NOEXEC\>/SB_NOEXEC/g' \ -e 's/\<MS_SYNCHRONOUS\>/SB_SYNCHRONOUS/g' \ -e 's/\<MS_MANDLOCK\>/SB_MANDLOCK/g' \ -e 's/\<MS_DIRSYNC\>/SB_DIRSYNC/g' \ -e 's/\<MS_NOATIME\>/SB_NOATIME/g' \ -e 's/\<MS_NODIRATIME\>/SB_NODIRATIME/g' \ -e 's/\<MS_SILENT\>/SB_SILENT/g' \ -e 's/\<MS_POSIXACL\>/SB_POSIXACL/g' \ -e 's/\<MS_KERNMOUNT\>/SB_KERNMOUNT/g' \ -e 's/\<MS_I_VERSION\>/SB_I_VERSION/g' \ -e 's/\<MS_LAZYTIME\>/SB_LAZYTIME/g' \ $list and commit it with something along the lines of 'convert filesystems away from use of MS_... constants' as commit message, it would save a quite a bit of headache next cycle" * 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: VFS: Differentiate mount flags (MS_*) from internal superblock flags VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb) vfs: Add sb_rdonly(sb) to query the MS_RDONLY flag on s_flags
2017-09-14Merge branch 'work.set_fs' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull more set_fs removal from Al Viro: "Christoph's 'use kernel_read and friends rather than open-coding set_fs()' series" * 'work.set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs: unexport vfs_readv and vfs_writev fs: unexport vfs_read and vfs_write fs: unexport __vfs_read/__vfs_write lustre: switch to kernel_write gadget/f_mass_storage: stop messing with the address limit mconsole: switch to kernel_read btrfs: switch write_buf to kernel_write net/9p: switch p9_fd_read to kernel_write mm/nommu: switch do_mmap_private to kernel_read serial2002: switch serial2002_tty_write to kernel_{read/write} fs: make the buf argument to __kernel_write a void pointer fs: fix kernel_write prototype fs: fix kernel_read prototype fs: move kernel_read to fs/read_write.c fs: move kernel_write to fs/read_write.c autofs4: switch autofs4_write to __kernel_write ashmem: switch to ->read_iter
2017-09-14Merge branch 'work.ipc' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull ipc compat cleanup and 64-bit time_t from Al Viro: "IPC copyin/copyout sanitizing, including 64bit time_t work from Deepa Dinamani" * 'work.ipc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: utimes: Make utimes y2038 safe ipc: shm: Make shmid_kernel timestamps y2038 safe ipc: sem: Make sem_array timestamps y2038 safe ipc: msg: Make msg_queue timestamps y2038 safe ipc: mqueue: Replace timespec with timespec64 ipc: Make sys_semtimedop() y2038 safe get rid of SYSVIPC_COMPAT on ia64 semtimedop(): move compat to native shmat(2): move compat to native msgrcv(2), msgsnd(2): move compat to native ipc(2): move compat to native ipc: make use of compat ipc_perm helpers semctl(): move compat to native semctl(): separate all layout-dependent copyin/copyout msgctl(): move compat to native msgctl(): split the actual work from copyin/copyout ipc: move compat shmctl to native shmctl: split the work from copyin/copyout
2017-09-14Merge branch 'zstd-minimal' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull zstd support from Chris Mason: "Nick Terrell's patch series to add zstd support to the kernel has been floating around for a while. After talking with Dave Sterba, Herbert and Phillip, we decided to send the whole thing in as one pull request. zstd is a big win in speed over zlib and in compression ratio over lzo, and the compression team here at FB has gotten great results using it in production. Nick will continue to update the kernel side with new improvements from the open source zstd userland code. Nick has a number of benchmarks for the main zstd code in his lib/zstd commit: I ran the benchmarks on a Ubuntu 14.04 VM with 2 cores and 4 GiB of RAM. The VM is running on a MacBook Pro with a 3.1 GHz Intel Core i7 processor, 16 GB of RAM, and a SSD. I benchmarked using `silesia.tar` [3], which is 211,988,480 B large. Run the following commands for the benchmark: sudo modprobe zstd_compress_test sudo mknod zstd_compress_test c 245 0 sudo cp silesia.tar zstd_compress_test The time is reported by the time of the userland `cp`. The MB/s is computed with 1,536,217,008 B / time(buffer size, hash) which includes the time to copy from userland. The Adjusted MB/s is computed with 1,536,217,088 B / (time(buffer size, hash) - time(buffer size, none)). The memory reported is the amount of memory the compressor requests. | Method | Size (B) | Time (s) | Ratio | MB/s | Adj MB/s | Mem (MB) | |----------|----------|----------|-------|---------|----------|----------| | none | 11988480 | 0.100 | 1 | 2119.88 | - | - | | zstd -1 | 73645762 | 1.044 | 2.878 | 203.05 | 224.56 | 1.23 | | zstd -3 | 66988878 | 1.761 | 3.165 | 120.38 | 127.63 | 2.47 | | zstd -5 | 65001259 | 2.563 | 3.261 | 82.71 | 86.07 | 2.86 | | zstd -10 | 60165346 | 13.242 | 3.523 | 16.01 | 16.13 | 13.22 | | zstd -15 | 58009756 | 47.601 | 3.654 | 4.45 | 4.46 | 21.61 | | zstd -19 | 54014593 | 102.835 | 3.925 | 2.06 | 2.06 | 60.15 | | zlib -1 | 77260026 | 2.895 | 2.744 | 73.23 | 75.85 | 0.27 | | zlib -3 | 72972206 | 4.116 | 2.905 | 51.50 | 52.79 | 0.27 | | zlib -6 | 68190360 | 9.633 | 3.109 | 22.01 | 22.24 | 0.27 | | zlib -9 | 67613382 | 22.554 | 3.135 | 9.40 | 9.44 | 0.27 | I benchmarked zstd decompression using the same method on the same machine. The benchmark file is located in the upstream zstd repo under `contrib/linux-kernel/zstd_decompress_test.c` [4]. The memory reported is the amount of memory required to decompress data compressed with the given compression level. If you know the maximum size of your input, you can reduce the memory usage of decompression irrespective of the compression level. | Method | Time (s) | MB/s | Adjusted MB/s | Memory (MB) | |----------|----------|---------|---------------|-------------| | none | 0.025 | 8479.54 | - | - | | zstd -1 | 0.358 | 592.15 | 636.60 | 0.84 | | zstd -3 | 0.396 | 535.32 | 571.40 | 1.46 | | zstd -5 | 0.396 | 535.32 | 571.40 | 1.46 | | zstd -10 | 0.374 | 566.81 | 607.42 | 2.51 | | zstd -15 | 0.379 | 559.34 | 598.84 | 4.61 | | zstd -19 | 0.412 | 514.54 | 547.77 | 8.80 | | zlib -1 | 0.940 | 225.52 | 231.68 | 0.04 | | zlib -3 | 0.883 | 240.08 | 247.07 | 0.04 | | zlib -6 | 0.844 | 251.17 | 258.84 | 0.04 | | zlib -9 | 0.837 | 253.27 | 287.64 | 0.04 | I ran a long series of tests and benchmarks on the btrfs side and the gains are very similar to the core benchmarks Nick ran" * 'zstd-minimal' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: squashfs: Add zstd support btrfs: Add zstd support lib: Add zstd modules lib: Add xxhash module
2017-09-14Merge tag 'for-4.14/dm-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mike Snitzer: - Some request-based DM core and DM multipath fixes and cleanups - Constify a few variables in DM core and DM integrity - Add bufio optimization and checksum failure accounting to DM integrity - Fix DM integrity to avoid checking integrity of failed reads - Fix DM integrity to use init_completion - A couple DM log-writes target fixes - Simplify DAX flushing by eliminating the unnecessary flush abstraction that was stood up for DM's use. * tag 'for-4.14/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dax: remove the pmem_dax_ops->flush abstraction dm integrity: use init_completion instead of COMPLETION_INITIALIZER_ONSTACK dm integrity: make blk_integrity_profile structure const dm integrity: do not check integrity for failed read operations dm log writes: fix >512b sectorsize support dm log writes: don't use all the cpu while waiting to log blocks dm ioctl: constify ioctl lookup table dm: constify argument arrays dm integrity: count and display checksum failures dm integrity: optimize writing dm-bufio buffers that are partially changed dm rq: do not update rq partially in each ending bio dm rq: make dm-sq requeuing behavior consistent with dm-mq behavior dm mpath: complain about unsupported __multipath_map_bio() return values dm mpath: avoid that building with W=1 causes gcc 7 to complain about fall-through
2017-09-14orangefs: Adjust three checks for null pointersMarkus Elfring
MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The script “checkpatch.pl” pointed information out like the following. Comparison to NULL could be written !… Thus fix affected source code places. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-09-14orangefs: Use kcalloc() in orangefs_prepare_cdm_array()Markus Elfring
* A multiplication for the size determination of a memory allocation indicated that an array data structure should be processed. Thus use the corresponding function "kcalloc". This issue was detected by using the Coccinelle software. * Replace the specification of a data structure by a pointer dereference to make the corresponding size determination a bit safer according to the Linux coding style convention. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-09-14orangefs: Delete error messages for a failed memory allocation in five functionsMarkus Elfring
Omit an extra message for a memory allocation failure in these functions. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-09-14orangefs: constify xattr_handler structureJulia Lawall
The xattr_handler structure is only stored in an array of const structures. Thus the xattr_handler structure itself can be const. Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-09-14orangefs: don't call filemap_write_and_wait from fsyncJeff Layton
Orangefs doesn't do buffered writes yet, so there's no point in initiating and waiting for writeback. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-09-14orangefs: off by ones in xattr size checksDan Carpenter
A previous patch which claimed to remove off by ones actually introduced them. strlen() returns the length of the string not including the NUL character. We are using strcpy() to copy "name" into a buffer which is ORANGEFS_MAX_XATTR_NAMELEN characters long. We should make sure to leave space for the NUL, otherwise we're writing one character beyond the end of the buffer. Fixes: e675c5ec51fe ("orangefs: clean up oversize xattr validation") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-09-14orangefs: react properly to posix_acl_update_mode's aftermath.Mike Marshall
posix_acl_update_mode checks to see if the permissions described by the ACL can be encoded into the object's mode. If so, it sets "acl" to NULL and "mode" to the new desired value. Prior to this patch we failed to actually propagate the new mode back to the server. Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-09-14orangefs: Don't clear SGID when inheriting ACLsJan Kara
When new directory 'DIR1' is created in a directory 'DIR0' with SGID bit set, DIR1 is expected to have SGID bit set (and owning group equal to the owning group of 'DIR0'). However when 'DIR0' also has some default ACLs that 'DIR1' inherits, setting these ACLs will result in SGID bit on 'DIR1' to get cleared if user is not member of the owning group. Fix the problem by creating __orangefs_set_acl() function that does not call posix_acl_update_mode() and use it when inheriting ACLs. That prevents SGID bit clearing and the mode has been properly set by posix_acl_create() anyway. Fixes: 073931017b49d9458aa351605b43a7e34598caef CC: stable@vger.kernel.org CC: Mike Marshall <hubcap@omnibond.com> CC: pvfs2-developers@beowulf-underground.org Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-09-13mm: treewide: remove GFP_TEMPORARY allocation flagMichal Hocko
GFP_TEMPORARY was introduced by commit e12ba74d8ff3 ("Group short-lived and reclaimable kernel allocations") along with __GFP_RECLAIMABLE. It's primary motivation was to allow users to tell that an allocation is short lived and so the allocator can try to place such allocations close together and prevent long term fragmentation. As much as this sounds like a reasonable semantic it becomes much less clear when to use the highlevel GFP_TEMPORARY allocation flag. How long is temporary? Can the context holding that memory sleep? Can it take locks? It seems there is no good answer for those questions. The current implementation of GFP_TEMPORARY is basically GFP_KERNEL | __GFP_RECLAIMABLE which in itself is tricky because basically none of the existing caller provide a way to reclaim the allocated memory. So this is rather misleading and hard to evaluate for any benefits. I have checked some random users and none of them has added the flag with a specific justification. I suspect most of them just copied from other existing users and others just thought it might be a good idea to use without any measuring. This suggests that GFP_TEMPORARY just motivates for cargo cult usage without any reasoning. I believe that our gfp flags are quite complex already and especially those with highlevel semantic should be clearly defined to prevent from confusion and abuse. Therefore I propose dropping GFP_TEMPORARY and replace all existing users to simply use GFP_KERNEL. Please note that SLAB users with shrinkers will still get __GFP_RECLAIMABLE heuristic and so they will be placed properly for memory fragmentation prevention. I can see reasons we might want some gfp flag to reflect shorterm allocations but I propose starting from a clear semantic definition and only then add users with proper justification. This was been brought up before LSF this year by Matthew [1] and it turned out that GFP_TEMPORARY really doesn't have a clear semantic. It seems to be a heuristic without any measured advantage for most (if not all) its current users. The follow up discussion has revealed that opinions on what might be temporary allocation differ a lot between developers. So rather than trying to tweak existing users into a semantic which they haven't expected I propose to simply remove the flag and start from scratch if we really need a semantic for short term allocations. [1] http://lkml.kernel.org/r/20170118054945.GD18349@bombadil.infradead.org [akpm@linux-foundation.org: fix typo] [akpm@linux-foundation.org: coding-style fixes] [sfr@canb.auug.org.au: drm/i915: fix up] Link: http://lkml.kernel.org/r/20170816144703.378d4f4d@canb.auug.org.au Link: http://lkml.kernel.org/r/20170728091904.14627-1-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Mel Gorman <mgorman@suse.de> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Matthew Wilcox <willy@infradead.org> Cc: Neil Brown <neilb@suse.de> Cc: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-13fscache: fix fscache_objlist_show format processingArnd Bergmann
gcc points out a minor bug in the handling of unknown cookie types, which could result in a string overflow when the integer is copied into a 3-byte string: fs/fscache/object-list.c: In function 'fscache_objlist_show': fs/fscache/object-list.c:265:19: error: 'sprintf' may write a terminating nul past the end of the destination [-Werror=format-overflow=] sprintf(_type, "%02u", cookie->def->type); ^~~~~~ fs/fscache/object-list.c:265:4: note: 'sprintf' output between 3 and 4 bytes into a destination of size 3 This is currently harmless as no code sets a type other than 0 or 1, but it makes sense to use snprintf() here to avoid overflowing the array if that changes. Link: http://lkml.kernel.org/r/20170714120720.906842-22-arnd@arndb.de Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-13procfs: remove unused variableArnd Bergmann
In NOMMU configurations, we get a warning about a variable that has become unused: fs/proc/task_nommu.c: In function 'nommu_vma_show': fs/proc/task_nommu.c:148:28: error: unused variable 'priv' [-Werror=unused-variable] Link: http://lkml.kernel.org/r/20170911200231.3171415-1-arnd@arndb.de Fixes: 1240ea0dc3bb ("fs, proc: remove priv argument from is_stack") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-13Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse updates from Miklos Szeredi: "This fixes a regression (spotted by the Sandstorm.io folks) in the pid namespace handling introduced in 4.12. There's also a fix for honoring sync/dsync flags for pwritev2()" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: getattr cleanup fuse: honor iocb sync flags on write fuse: allow server to run in different pid_ns
2017-09-13Merge branch 'overlayfs-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull overlayfs updates from Miklos Szeredi: "This fixes d_ino correctness in readdir, which brings overlayfs on par with normal filesystems regarding inode number semantics, as long as all layers are on the same filesystem. There are also some bug fixes, one in particular (random ioctl's shouldn't be able to modify lower layers) that touches some vfs code, but of course no-op for non-overlay fs" * 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: fix false positive ESTALE on lookup ovl: don't allow writing ioctl on lower layer ovl: fix relatime for directories vfs: add flags to d_real() ovl: cleanup d_real for negative ovl: constant d_ino for non-merge dirs ovl: constant d_ino across copy up ovl: fix readdir error value ovl: check snprintf return
2017-09-12Merge tag 'f2fs-for-4.14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this round, we've mostly tuned f2fs to provide better user experience for Android. Especially, we've worked on atomic write feature again with SQLite community in order to support it officially. And we added or modified several facilities to analyze and enhance IO behaviors. Major changes include: - add app/fs io stat - add inode checksum feature - support project/journalled quota - enhance atomic write with new ioctl() which exposes feature set - enhance background gc/discard/fstrim flows with new gc_urgent mode - add F2FS_IOC_FS{GET,SET}XATTR - fix some quota flows" * tag 'f2fs-for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (63 commits) f2fs: hurry up to issue discard after io interruption f2fs: fix to show correct discard_granularity in sysfs f2fs: detect dirty inode in evict_inode f2fs: clear radix tree dirty tag of pages whose dirty flag is cleared f2fs: speed up gc_urgent mode with SSR f2fs: better to wait for fstrim completion f2fs: avoid race in between read xattr & write xattr f2fs: make get_lock_data_page to handle encrypted inode f2fs: use generic terms used for encrypted block management f2fs: introduce f2fs_encrypted_file for clean-up Revert "f2fs: add a new function get_ssr_cost" f2fs: constify super_operations f2fs: fix to wake up all sleeping flusher f2fs: avoid race in between atomic_read & atomic_inc f2fs: remove unneeded parameter of change_curseg f2fs: update i_flags correctly f2fs: don't check inode's checksum if it was dirtied or writebacked f2fs: don't need to update inode checksum for recovery f2fs: trigger fdatasync for non-atomic_write file f2fs: fix to avoid race in between aio and gc ...
2017-09-12Merge tag 'ceph-for-4.14-rc1' of git://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph updates from Ilya Dryomov: "The highlights include: - a large series of fixes and improvements to the snapshot-handling code (Zheng Yan) - individual read/write OSD requests passed down to libceph are now limited to 16M in size to avoid hitting OSD-side limits (Zheng Yan) - encode MStatfs v2 message to allow for more accurate space usage reporting (Douglas Fuller) - switch to the new writeback error tracking infrastructure (Jeff Layton)" * tag 'ceph-for-4.14-rc1' of git://github.com/ceph/ceph-client: (35 commits) ceph: stop on-going cached readdir if mds revokes FILE_SHARED cap ceph: wait on writeback after writing snapshot data ceph: fix capsnap dirty pages accounting ceph: ignore wbc->range_{start,end} when write back snapshot data ceph: fix "range cyclic" mode writepages ceph: cleanup local variables in ceph_writepages_start() ceph: optimize pagevec iterating in ceph_writepages_start() ceph: make writepage_nounlock() invalidate page that beyonds EOF ceph: properly get capsnap's size in get_oldest_context() ceph: remove stale check in ceph_invalidatepage() ceph: queue cap snap only when snap realm's context changes ceph: handle race between vmtruncate and queuing cap snap ceph: fix message order check in handle_cap_export() ceph: fix NULL pointer dereference in ceph_flush_snaps() ceph: adjust 36 checks for NULL pointers ceph: delete an unnecessary return statement in update_dentry_lease() ceph: ENOMEM pr_err in __get_or_create_frag() is redundant ceph: check negative offsets in ceph_llseek() ceph: more accurate statfs ceph: properly set snap follows for cap reconnect ...
2017-09-12xfs: XFS_IS_REALTIME_INODE() should be false if no rt device presentRichard Wareing
If using a kernel with CONFIG_XFS_RT=y and we set the RHINHERIT flag on a directory in a filesystem that does not have a realtime device and create a new file in that directory, it gets marked as a real time file. When data is written and a fsync is issued, the filesystem attempts to flush a non-existent rt device during the fsync process. This results in a crash dereferencing a null buftarg pointer in xfs_blkdev_issue_flush(): BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 IP: xfs_blkdev_issue_flush+0xd/0x20 ..... Call Trace: xfs_file_fsync+0x188/0x1c0 vfs_fsync_range+0x3b/0xa0 do_fsync+0x3d/0x70 SyS_fsync+0x10/0x20 do_syscall_64+0x4d/0xb0 entry_SYSCALL64_slow_path+0x25/0x25 Setting RT inode flags does not require special privileges so any unprivileged user can cause this oops to occur. To reproduce, confirm kernel is compiled with CONFIG_XFS_RT=y and run: # mkfs.xfs -f /dev/pmem0 # mount /dev/pmem0 /mnt/test # mkdir /mnt/test/foo # xfs_io -c 'chattr +t' /mnt/test/foo # xfs_io -f -c 'pwrite 0 5m' -c fsync /mnt/test/foo/bar Or just run xfstests with MKFS_OPTIONS="-d rtinherit=1" and wait. Kernels built with CONFIG_XFS_RT=n are not exposed to this bug. Fixes: f538d4da8d52 ("[XFS] write barrier support") Cc: <stable@vger.kernel.org> Signed-off-by: Richard Wareing <rwareing@fb.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-12f2fs: hurry up to issue discard after io interruptionChao Yu
Once we encounter I/O interruption during issuing discards, we will delay long time before next round, but if system status is I/O idle during the time, it may loses opportunity to issue discards. So this patch changes to hurry up to issue discard after io interruption. Besides, this patch also fixes to issue discards accurately with assigned rate. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-12f2fs: fix to show correct discard_granularity in sysfsChao Yu
Fix below incorrect display when reading discard_granularity sysfs node. $ cat /sys/fs/f2fs/<device>/discard_granularity $ 16 $ echo 32 > /sys/fs/f2fs/<device>/discard_granularity $ cat /sys/fs/f2fs/<device>/discard_granularity $ 16 Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-12f2fs: detect dirty inode in evict_inodeChao Yu
Add a bugon in f2fs_evict_inode to detect inconsistent status between inode cache and related node page cache. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-12ovl: fix false positive ESTALE on lookupAmir Goldstein
Commit b9ac5c274b8c ("ovl: hash overlay non-dir inodes by copy up origin") verifies that the origin lower inode stored in the overlayfs inode matched the inode of a copy up origin dentry found by lookup. There is a false positive result in that check when lower fs does not support file handles and copy up origin cannot be followed by file handle at lookup time. The false negative happens when finding an overlay inode in cache on a copied up overlay dentry lookup. The overlay inode still 'remembers' the copy up origin inode, but the copy up origin dentry is not available for verification. Relax the check in case copy up origin dentry is not available. Fixes: b9ac5c274b8c ("ovl: hash overlay non-dir inodes by copy up...") Cc: <stable@vger.kernel.org> # v4.13 Reported-by: Jordi Pujol <jordipujolp@gmail.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-09-12fuse: getattr cleanupMiklos Szeredi
The refreshed argument isn't used by any caller, get rid of it. Use a helper for just updating the inode (no need to fill in a kstat). Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-09-12fuse: honor iocb sync flags on writeMiklos Szeredi
If the IOCB_DSYNC flag is set a sync is not being performed by fuse_file_write_iter. Honor IOCB_DSYNC/IOCB_SYNC by setting O_DYSNC/O_SYNC respectively in the flags filed of the write request. We don't need to sync data or metadata, since fuse_perform_write() does write-through and the filesystem is responsible for updating file times. Original patch by Vitaly Zolotusky. Reported-by: Nate Clark <nate@neworld.us> Cc: Vitaly Zolotusky <vitaly@unitc.com>. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-09-12fuse: allow server to run in different pid_nsMiklos Szeredi
Commit 0b6e9ea041e6 ("fuse: Add support for pid namespaces") broke Sandstorm.io development tools, which have been sending FUSE file descriptors across PID namespace boundaries since early 2014. The above patch added a check that prevented I/O on the fuse device file descriptor if the pid namespace of the reader/writer was different from the pid namespace of the mounter. With this change passing the device file descriptor to a different pid namespace simply doesn't work. The check was added because pids are transferred to/from the fuse userspace server in the namespace registered at mount time. To fix this regression, remove the checks and do the following: 1) the pid in the request header (the pid of the task that initiated the filesystem operation) is translated to the reader's pid namespace. If a mapping doesn't exist for this pid, then a zero pid is used. Note: even if a mapping would exist between the initiator task's pid namespace and the reader's pid namespace the pid will be zero if either mapping from initator's to mounter's namespace or mapping from mounter's to reader's namespace doesn't exist. 2) The lk.pid value in setlk/setlkw requests and getlk reply is left alone. Userspace should not interpret this value anyway. Also allow the setlk/setlkw operations if the pid of the task cannot be represented in the mounter's namespace (pid being zero in that case). Reported-by: Kenton Varda <kenton@sandstorm.io> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 0b6e9ea041e6 ("fuse: Add support for pid namespaces") Cc: <stable@vger.kernel.org> # v4.12+ Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Seth Forshee <seth.forshee@canonical.com>
2017-09-11Merge tag 'nfs-for-4.14-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client updates from Trond Myklebust: "Hightlights include: Stable bugfixes: - Fix mirror allocation in the writeback code to avoid a use after free - Fix the O_DSYNC writes to use the correct byte range - Fix 2 use after free issues in the I/O code Features: - Writeback fixes to split up the inode->i_lock in order to reduce contention - RPC client receive fixes to reduce the amount of time the xprt->transport_lock is held when receiving data from a socket into am XDR buffer. - Ditto fixes to reduce contention between call side users of the rdma rb_lock, and its use in rpcrdma_reply_handler. - Re-arrange rdma stats to reduce false cacheline sharing. - Various rdma cleanups and optimisations. - Refactor the NFSv4.1 exchange id code and clean up the code. - Const-ify all instances of struct rpc_xprt_ops Bugfixes: - Fix the NFSv2 'sec=' mount option. - NFSv4.1: don't use machine credentials for CLOSE when using 'sec=sys' - Fix the NFSv3 GRANT callback when the port changes on the server. - Fix livelock issues with COMMIT - NFSv4: Use correct inode in _nfs4_opendata_to_nfs4_state() when doing and NFSv4.1 open by filehandle" * tag 'nfs-for-4.14-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (69 commits) NFS: Count the bytes of skipped subrequests in nfs_lock_and_join_requests() NFS: Don't hold the group lock when calling nfs_release_request() NFS: Remove pnfs_generic_transfer_commit_list() NFS: nfs_lock_and_join_requests and nfs_scan_commit_list can deadlock NFS: Fix 2 use after free issues in the I/O code NFS: Sync the correct byte range during synchronous writes lockd: Delete an error message for a failed memory allocation in reclaimer() NFS: remove jiffies field from access cache NFS: flush data when locking a file to ensure cache coherence for mmap. SUNRPC: remove some dead code. NFS: don't expect errors from mempool_alloc(). xprtrdma: Use xprt_pin_rqst in rpcrdma_reply_handler xprtrdma: Re-arrange struct rx_stats NFS: Fix NFSv2 security settings NFSv4.1: don't use machine credentials for CLOSE when using 'sec=sys' SUNRPC: ECONNREFUSED should cause a rebind. NFS: Remove unused parameter gfp_flags from nfs_pageio_init() NFSv4: Fix up mirror allocation SUNRPC: Add a separate spinlock to protect the RPC request receive list SUNRPC: Cleanup xs_tcp_read_common() ...
2017-09-11f2fs: clear radix tree dirty tag of pages whose dirty flag is clearedDaeho Jeong
On a senario like writing out the first dirty page of the inode as the inline data, we only cleared dirty flags of the pages, but didn't clear the dirty tags of those pages in the radix tree. If we don't clear the dirty tags of the pages in the radix tree, the inodes which contain the pages will be marked with I_DIRTY_PAGES again and again, and writepages() for the inodes will be invoked in every writeback period. As a result, nothing will be done in every writepages() for the inodes and it will just consume CPU time meaninglessly. Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-11NFS: various changes relating to reporting IO errors.NeilBrown
1/ remove 'start' and 'end' args from nfs_file_fsync_commit(). They aren't used. 2/ Make nfs_context_set_write_error() a "static inline" in internal.h so we can... 3/ Use nfs_context_set_write_error() instead of mapping_set_error() if nfs_pageio_add_request() fails before sending any request. NFS generally keeps errors in the open_context, not the mapping, so this is more consistent. 4/ If filemap_write_and_write_range() reports any error, still check ctx->error. The value in ctx->error is likely to be more useful. As part of this, NFS_CONTEXT_ERROR_WRITE is cleared slightly earlier, before nfs_file_fsync_commit() is called, rather than at the start of that function. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-09-11NFS: Add static NFS I/O tracepointsChuck Lever
Tools like tcpdump and rpcdebug can be very useful. But there are plenty of environments where they are difficult or impossible to use. For example, we've had customers report I/O failures during workloads so heavy that collecting network traffic or enabling RPC debugging are themselves onerous. The kernel's static tracepoints are lightweight (less likely to introduce timing changes) and efficient (the trace data is compact). They also work in scenarios where capturing network traffic is not possible due to lack of hardware support (some InfiniBand HCAs) or where data or network privacy is a concern. Introduce tracepoints that show when an NFS READ, WRITE, or COMMIT is initiated, and when it completes. Record the arguments and results of each operation, which are not shown by existing sunrpc module's tracepoints. For instance, the recorded offset and count can be used to match an "initiate" event to a "done" event. If an NFS READ result returns fewer bytes than requested or zero, seeing the EOF flag can be probative. Seeing an NFS4ERR_BAD_STATEID result is also indication of a particular class of problems. The timing information attached to each event record can often be useful as well. Usage example: [root@manet tmp]# trace-cmd record -e nfs:*initiate* -e nfs:*done /sys/kernel/debug/tracing/events/nfs/*initiate*/filter /sys/kernel/debug/tracing/events/nfs/*done/filter Hit Ctrl^C to stop recording ^CKernel buffer statistics: Note: "entries" are the entries left in the kernel ring buffer and are not recorded in the trace data. They should all be zero. CPU: 0 entries: 0 overrun: 0 commit overrun: 0 bytes: 3680 oldest event ts: 78.367422 now ts: 100.124419 dropped events: 0 read events: 74 ... and so on. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-09-11pNFS: Use the standard I/O stateid when calling LAYOUTGETTrond Myklebust
Instead of having a private method for copying the open/delegation stateid, use the same call that is used for standard I/O through the MDS. Note that this means we transmit the stateid with a zero seqid, avoiding issues with NFS4ERR_OLD_STATEID. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-09-11Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull namespace updates from Eric Biederman: "Life has been busy and I have not gotten half as much done this round as I would have liked. I delayed it so that a minor conflict resolution with the mips tree could spend a little time in linux-next before I sent this pull request. This includes two long delayed user namespace changes from Kirill Tkhai. It also includes a very useful change from Serge Hallyn that allows the security capability attribute to be used inside of user namespaces. The practical effect of this is people can now untar tarballs and install rpms in user namespaces. It had been suggested to generalize this and encode some of the namespace information information in the xattr name. Upon close inspection that makes the things that should be hard easy and the things that should be easy more expensive. Then there is my bugfix/cleanup for signal injection that removes the magic encoding of the siginfo union member from the kernel internal si_code. The mips folks reported the case where I had used FPE_FIXME me is impossible so I have remove FPE_FIXME from mips, while at the same time including a return statement in that case to keep gcc from complaining about unitialized variables. I almost finished the work to get make copy_siginfo_to_user a trivial copy to user. The code is available at: git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git neuter-copy_siginfo_to_user-v3 But I did not have time/energy to get the code posted and reviewed before the merge window opened. I was able to see that the security excuse for just copying fields that we know are initialized doesn't work in practice there are buggy initializations that don't initialize the proper fields in siginfo. So we still sometimes copy unitialized data to userspace" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: Introduce v3 namespaced file capabilities mips/signal: In force_fcr31_sig return in the impossible case signal: Remove kernel interal si_code magic fcntl: Don't use ambiguous SIG_POLL si_codes prctl: Allow local CAP_SYS_ADMIN changing exe_file security: Use user_namespace::level to avoid redundant iterations in cap_capable() userns,pidns: Verify the userns for new pid namespaces signal/testing: Don't look for __SI_FAULT in userspace signal/mips: Document a conflict with SI_USER with SIGFPE signal/sparc: Document a conflict with SI_USER with SIGFPE signal/ia64: Document a conflict with SI_USER with SIGFPE signal/alpha: Document a conflict with SI_USER for SIGTRAP
2017-09-11f2fs: speed up gc_urgent mode with SSRJaegeuk Kim
This patch activates SSR in gc_urgent mode. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-11f2fs: better to wait for fstrim completionJaegeuk Kim
In android, we'd better wait for fstrim completion instead of issuing the discard commands asynchronous. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-11Merge tag 'libnvdimm-for-4.14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm from Dan Williams: "A rework of media error handling in the BTT driver and other updates. It has appeared in a few -next releases and collected some late- breaking build-error and warning fixups as a result. Summary: - Media error handling support in the Block Translation Table (BTT) driver is reworked to address sleeping-while-atomic locking and memory-allocation-context conflicts. - The dax_device lookup overhead for xfs and ext4 is moved out of the iomap hot-path to a mount-time lookup. - A new 'ecc_unit_size' sysfs attribute is added to advertise the read-modify-write boundary property of a persistent memory range. - Preparatory fix-ups for arm and powerpc pmem support are included along with other miscellaneous fixes" * tag 'libnvdimm-for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (26 commits) libnvdimm, btt: fix format string warnings libnvdimm, btt: clean up warning and error messages ext4: fix null pointer dereference on sbi libnvdimm, nfit: move the check on nd_reserved2 to the endpoint dax: fix FS_DAX=n BLOCK=y compilation libnvdimm: fix integer overflow static analysis warning libnvdimm, nd_blk: remove mmio_flush_range() libnvdimm, btt: rework error clearing libnvdimm: fix potential deadlock while clearing errors libnvdimm, btt: cache sector_size in arena_info libnvdimm, btt: ensure that flags were also unchanged during a map_read libnvdimm, btt: refactor map entry operations with macros libnvdimm, btt: fix a missed NVDIMM_IO_ATOMIC case in the write path libnvdimm, nfit: export an 'ecc_unit_size' sysfs attribute ext4: perform dax_device lookup at mount ext2: perform dax_device lookup at mount xfs: perform dax_device lookup at mount dax: introduce a fs_dax_get_by_bdev() helper libnvdimm, btt: check memory allocation failure libnvdimm, label: fix index block size calculation ...
2017-09-11dax: remove the pmem_dax_ops->flush abstractionMikulas Patocka
Commit abebfbe2f731 ("dm: add ->flush() dax operation support") is buggy. A DM device may be composed of multiple underlying devices and all of them need to be flushed. That commit just routes the flush request to the first device and ignores the other devices. It could be fixed by adding more complex logic to the device mapper. But there is only one implementation of the method pmem_dax_ops->flush - that is pmem_dax_flush() - and it calls arch_wb_cache_pmem(). Consequently, we don't need the pmem_dax_ops->flush abstraction at all, we can call arch_wb_cache_pmem() directly from dax_flush() because dax_dev->ops->flush can't ever reach anything different from arch_wb_cache_pmem(). It should be also pointed out that for some uses of persistent memory it is needed to flush only a very small amount of data (such as 1 cacheline), and it would be overkill if we go through that device mapper machinery for a single flushed cache line. Fix this by removing the pmem_dax_ops->flush abstraction and call arch_wb_cache_pmem() directly from dax_flush(). Also, remove the device mapper code that forwards the flushes. Fixes: abebfbe2f731 ("dm: add ->flush() dax operation support") Cc: stable@vger.kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-09-09Merge tag 'for-linus-20170904' of git://git.infradead.org/linux-mtdLinus Torvalds
Pull MTD updates from Boris Brezillon: "General updates: - Constify pci_device_id in various drivers - Constify device_type - Remove pad control code from the Gemini driver - Use %pOF to print OF node full_name - Various fixes in the physmap_of driver - Remove unused vars in mtdswap - Check devm_kzalloc() return value in the spear_smi driver - Check clk_prepare_enable() return code in the st_spi_fsm driver - Create per MTD device debugfs enties NAND updates, from Boris Brezillon: - Fix memory leaks in the core - Remove unused NAND locking support - Rename nand.h into rawnand.h (preparing support for spi NANDs) - Use NAND_MAX_ID_LEN where appropriate - Fix support for 20nm Hynix chips - Fix support for Samsung and Hynix SLC NANDs - Various cleanup, improvements and fixes in the qcom driver - Fixes for bugs detected by various static code analysis tools - Fix mxc ooblayout definition - Add a new part_parsers to tmio and sharpsl platform data in order to define a custom list of partition parsers - Request the reset line in exclusive mode in the sunxi driver - Fix a build error in the orion-nand driver when compiled for ARMv4 - Allow 64-bit mvebu platforms to select the PXA3XX driver SPI NOR updates, from Cyrille Pitchen and Marek Vasut: - add support to the JEDEC JESD216B specification (SFDP tables). - add support to the Intel Denverton SPI flash controller. - fix error recovery for Spansion/Cypress SPI NOR memories. - fix 4-byte address management for the Aspeed SPI controller. - add support to some Microchip SST26 memory parts - remove unneeded pinctrl header Write a message for tag:" * tag 'for-linus-20170904' of git://git.infradead.org/linux-mtd: (74 commits) mtd: nand: complain loudly when chip->bits_per_cell is not correctly initialized mtd: nand: make Samsung SLC NAND usable again mtd: nand: tmio: Register partitions using the parsers mfd: tmio: Add partition parsers platform data mtd: nand: sharpsl: Register partitions using the parsers mtd: nand: sharpsl: Add partition parsers platform data mtd: nand: qcom: Support for IPQ8074 QPIC NAND controller mtd: nand: qcom: support for IPQ4019 QPIC NAND controller dt-bindings: qcom_nandc: IPQ8074 QPIC NAND documentation dt-bindings: qcom_nandc: IPQ4019 QPIC NAND documentation dt-bindings: qcom_nandc: fix the ipq806x device tree example mtd: nand: qcom: support for different DEV_CMD register offsets mtd: nand: qcom: QPIC data descriptors handling mtd: nand: qcom: enable BAM or ADM mode mtd: nand: qcom: erased codeword detection configuration mtd: nand: qcom: support for read location registers mtd: nand: qcom: support for passing flags in DMA helper functions mtd: nand: qcom: add BAM DMA descriptor handling mtd: nand: qcom: allocate BAM transaction mtd: nand: qcom: DMA mapping support for register read buffer ...
2017-09-09NFS: Count the bytes of skipped subrequests in nfs_lock_and_join_requests()Trond Myklebust
If we skip a subrequest due to a zero refcount, we should still count the byte range that it covered so that we accurately reconstruct the original request size. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-09-09Merge tag 'nfsd-4.14' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd updates from Bruce Fields: "More RDMA work and some op-structure constification from Chuck Lever, and a small cleanup to our xdr encoding" * tag 'nfsd-4.14' of git://linux-nfs.org/~bfields/linux: svcrdma: Estimate Send Queue depth properly rdma core: Add rdma_rw_mr_payload() svcrdma: Limit RQ depth svcrdma: Populate tail iovec when receiving nfsd: Incoming xdr_bufs may have content in tail buffer svcrdma: Clean up svc_rdma_build_read_chunk() sunrpc: Const-ify struct sv_serv_ops nfsd: Const-ify NFSv4 encoding and decoding ops arrays sunrpc: Const-ify instances of struct svc_xprt_ops nfsd4: individual encoders no longer see error cases nfsd4: skip encoder in trivial error cases nfsd4: define ->op_release for compound ops nfsd4: opdesc will be useful outside nfs4proc.c nfsd4: move some nfsd4 op definitions to xdr4.h
2017-09-09Merge branch 'for-4.14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs updates from David Sterba: "The changes range through all types: cleanups, core chagnes, sanity checks, fixes, other user visible changes, detailed list below: - deprecated: user transaction ioctl - mount option ssd does not change allocation alignments - degraded read-write mount is allowed if all the raid profile constraints are met, now based on more accurate check - defrag: do not reset compression afterwards; the NOCOMPRESS flag can be now overriden by defrag - prep work for better extent reference tracking (related to the qgroup slowness with balance) - prep work for compression heuristics - memory allocation reductions (may help latencies on a loaded system) - better accounting for io waiting states - error handling improvements (removed BUGs) - added more sanity checks for shared refs - fix readdir vs pagefault deadlock under some circumstances - fix for 'no-hole' mode, certain combination of compressed and inline extents - send: fix emission of invalid clone operations - fixup file mode if setting acls fail - more fixes from fuzzing - oher cleanups" * 'for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (104 commits) btrfs: submit superblock io with REQ_META and REQ_PRIO btrfs: remove unnecessary memory barrier in btrfs_direct_IO btrfs: remove superfluous chunk_tree argument from btrfs_alloc_dev_extent btrfs: Remove chunk_objectid parameter of btrfs_alloc_dev_extent btrfs: pass fs_info to btrfs_del_root instead of tree_root Btrfs: add one more sanity check for shared ref type Btrfs: remove BUG_ON in __add_tree_block Btrfs: remove BUG() in add_data_reference Btrfs: remove BUG() in print_extent_item Btrfs: remove BUG() in btrfs_extent_inline_ref_size Btrfs: convert to use btrfs_get_extent_inline_ref_type Btrfs: add a helper to retrive extent inline ref type btrfs: scrub: simplify scrub worker initialization btrfs: scrub: clean up division in scrub_find_csum btrfs: scrub: clean up division in __scrub_mark_bitmap btrfs: scrub: use bool for flush_all_writes btrfs: preserve i_mode if __btrfs_set_acl() fails btrfs: Remove extraneous chunk_objectid variable btrfs: Remove chunk_objectid argument from btrfs_make_block_group btrfs: Remove extra parentheses from condition in copy_items() ...
2017-09-09NFS: Don't hold the group lock when calling nfs_release_request()Trond Myklebust
That can deadlock if this is the last reference since nfs_page_group_destroy() calls nfs_page_group_sync_on_bit(). Note that even if the page was removed from the subpage list, the req->wb_head could still be pointing to the old head. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-09-09NFS: Remove pnfs_generic_transfer_commit_list()Trond Myklebust
It's pretty much a duplicate of nfs_scan_commit_list() that also clears the PG_COMMIT_TO_DS flag. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-09-09NFS: nfs_lock_and_join_requests and nfs_scan_commit_list can deadlockTrond Myklebust
Since the commit list is not ordered, it is possible for nfs_scan_commit_list to hold a request that nfs_lock_and_join_requests() is waiting for, while at the same time trying to grab a request that nfs_lock_and_join_requests already holds. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>