summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2017-05-02Merge branch 'work.compat' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull fs/compat.c cleanups from Al Viro: "More moving of compat syscalls from fs/compat.c to fs/*.c where the native counterparts live. And death to compat_sys_getdents64() - the only architecture that used to need it was ia64, and _that_ has lost biarch support quite a few years ago" * 'work.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs/compat.c: trim unused includes move compat_rw_copy_check_uvector() over to fs/read_write.c fhandle: move compat syscalls from compat.c open: move compat syscalls from compat.c stat: move compat syscalls from compat.c fcntl: move compat syscalls from compat.c readdir: move compat syscalls from compat.c statfs: move compat syscalls from compat.c utimes: move compat syscalls from compat.c move compat select-related syscalls to fs/select.c Remove compat_sys_getdents64()
2017-05-02Merge branch 'work.splice' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull splice updates from Al Viro: "These actually missed the last cycle; the branch itself is from last December" * 'work.splice' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: make nr_pages calculation in default_file_splice_read() a bit less ugly splice/tee/vmsplice: validate flags splice_pipe_desc: kill ->flags remove spd_release_page()
2017-05-02Merge branch 'work.iov_iter' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull iov_iter updates from Al Viro: "Cleanups that sat in -next + -stable fodder that has just missed 4.11. There's more iov_iter work in my local tree, but I'd prefer to push the stuff that had been in -next first" * 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: iov_iter: don't revert iov buffer if csum error generic_file_read_iter(): make use of iov_iter_revert() generic_file_direct_write(): make use of iov_iter_revert() orangefs: use iov_iter_revert() sctp: switch to copy_from_iter_full() net/9p: switch to copy_from_iter_full() switch memcpy_from_msg() to copy_from_iter_full() rds: make use of iov_iter_revert()
2017-05-02Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull CIFS fixes from Steve French: "Three cifs/smb3 fixes - including two for stable" * 'for-next' of git://git.samba.org/sfrench/cifs-2.6: cifs: don't check for failure from mempool_alloc() Do not return number of bytes written for ioctl CIFS_IOC_COPYCHUNK_FILE Fix match_prepath()
2017-05-02Merge tag 'pstore-v4.12-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull pstore updates from Kees Cook: "This has a large internal refactoring along with several smaller fixes. - constify compression structures; Bhumika Goyal - restore powerpc dumping; Ankit Kumar - fix more bugs in the rarely exercises module unloading logic - reorganize filesystem locking to fix problems noticed by lockdep - refactor internal pstore APIs to make development and review easier: - improve error reporting - add kernel-doc structure and function comments - avoid insane argument passing by using a common record structure" * tag 'pstore-v4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (23 commits) pstore: Solve lockdep warning by moving inode locks pstore: Fix flags to enable dumps on powerpc pstore: Remove unused vmalloc.h in pmsg pstore: simplify write_user_compat() pstore: Remove write_buf() callback pstore: Replace arguments for write_buf_user() API pstore: Replace arguments for write_buf() API pstore: Replace arguments for erase() API pstore: Do not duplicate record metadata pstore: Allocate records on heap instead of stack pstore: Pass record contents instead of copying pstore: Always allocate buffer for decompression pstore: Replace arguments for write() API pstore: Replace arguments for read() API pstore: Switch pstore_mkfile to pass record pstore: Move record decompression to function pstore: Extract common arguments into structure pstore: Add kernel-doc for struct pstore_info pstore: Improve register_pstore() error reporting pstore: Avoid race in module unloading ...
2017-05-02pNFS: Fix a typo in pnfs_generic_alloc_ds_commitsTrond Myklebust
If the layout segment is invalid, we want to just resend the remaining writes. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-05-02pNFS: Fix a deadlock when coalescing writes and returning the layoutTrond Myklebust
Consider the following deadlock: Process P1 Process P2 Process P3 ========== ========== ========== lock_page(page) lseg = pnfs_update_layout(inode) lo = NFS_I(inode)->layout pnfs_error_mark_layout_for_return(lo) lock_page(page) lseg = pnfs_update_layout(inode) In this scenario, - P1 has declared the layout to be in error, but P2 holds a reference to a layout segment on that inode, so the layoutreturn is deferred. - P2 is waiting for a page lock held by P3. - P3 is asking for a new layout segment, but is blocked waiting for the layoutreturn. The fix is to ensure that pnfs_error_mark_layout_for_return() does not set the NFS_LAYOUT_RETURN flag, which blocks P3. Instead, we allow the latter to call LAYOUTGET so that it can make progress and unblock P2. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-05-02pNFS: Don't clear the layout return info if there are segments to returnTrond Myklebust
In pnfs_clear_layoutreturn_info, ensure that we don't clear the layout return info if there are new segments queued for return due to, for instance, a race between a LAYOUTRETURN and a failed I/O attempt. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-05-02ext4: inherit encryption xattr before other xattrsEric Biggers
When using both encryption and SELinux (or another feature that requires an xattr per file) on a filesystem with 256-byte inodes, each file's xattrs usually spill into an external xattr block. Currently, the xattrs are inherited in the order ACL, security, then encryption. Therefore, if spillage occurs, the encryption xattr will always end up in the external block. This is not ideal because the encryption xattrs contain a nonce, so they will always be unique and will prevent the external xattr blocks from being deduplicated. To improve the situation, change the inheritance order to encryption, ACL, then security. This gives the encryption xattr a better chance to be stored in-inode, allowing the other xattr(s) to be deduplicated. Note that it may be better for userspace to format the filesystem with 512-byte inodes in this case. However, it's not the default. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-05-01Merge branch 'x86-process-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pul x86/process updates from Ingo Molnar: "The main change in this cycle was to add the ARCH_[GET|SET]_CPUID prctl() ABI extension to control the availability of the CPUID instruction, analogously to the existing PR_GET|SET_TSC ABI that controls RDTSC. Motivation: the 'rr' user-space record-and-replay execution debugger would like to trap and emulate the CPUID instruction - which instruction is normally unprivileged. Trapping CPUID is possible on IvyBridge and later Intel CPUs - expose this hardware capability" * 'x86-process-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/syscalls/32: Ignore arch_prctl for other architectures um/arch_prctl: Fix fallout from x86 arch_prctl() rework x86/arch_prctl: Add ARCH_[GET|SET]_CPUID x86/cpufeature: Detect CPUID faulting support x86/syscalls/32: Wire up arch_prctl on x86-32 x86/arch_prctl: Add do_arch_prctl_common() x86/arch_prctl/64: Rename do_arch_prctl() to do_arch_prctl_64() x86/arch_prctl/64: Use SYSCALL_DEFINE2 to define sys_arch_prctl() x86/arch_prctl: Rename 'code' argument to 'option' x86/msr: Rename MISC_FEATURE_ENABLES to MISC_FEATURES_ENABLES x86/process: Optimize TIF_NOTSC switch x86/process: Correct and optimize TIF_BLOCKSTEP switch x86/process: Optimize TIF checks in __switch_to_xtra()
2017-05-01Merge branch 'sched-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "The main changes in this cycle were: - another round of rq-clock handling debugging, robustization and fixes - PELT accounting improvements - CPU hotplug related ->cpus_allowed affinity handling fixes all around the tree - ... plus misc fixes, cleanups and updates" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (35 commits) sched/x86: Update reschedule warning text crypto: N2 - Replace racy task affinity logic cpufreq/sparc-us2e: Replace racy task affinity logic cpufreq/sparc-us3: Replace racy task affinity logic cpufreq/sh: Replace racy task affinity logic cpufreq/ia64: Replace racy task affinity logic ACPI/processor: Replace racy task affinity logic ACPI/processor: Fix error handling in __acpi_processor_start() sparc/sysfs: Replace racy task affinity logic powerpc/smp: Replace open coded task affinity logic ia64/sn/hwperf: Replace racy task affinity logic ia64/salinfo: Replace racy task affinity logic workqueue: Provide work_on_cpu_safe() ia64/topology: Remove cpus_allowed manipulation sched/fair: Move the PELT constants into a generated header sched/fair: Increase PELT accuracy for small tasks sched/fair: Fix comments sched/Documentation: Add 'sched-pelt' tool sched/fair: Fix corner case in __accumulate_sum() sched/core: Remove 'task' parameter and rename tsk_restore_flags() to current_restore_flags() ...
2017-05-01Merge branch 'work.uaccess' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull uaccess unification updates from Al Viro: "This is the uaccess unification pile. It's _not_ the end of uaccess work, but the next batch of that will go into the next cycle. This one mostly takes copy_from_user() and friends out of arch/* and gets the zero-padding behaviour in sync for all architectures. Dealing with the nocache/writethrough mess is for the next cycle; fortunately, that's x86-only. Same for cleanups in iov_iter.c (I am sold on access_ok() in there, BTW; just not in this pile), same for reducing __copy_... callsites, strn*... stuff, etc. - there will be a pile about as large as this one in the next merge window. This one sat in -next for weeks. -3KLoC" * 'work.uaccess' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (96 commits) HAVE_ARCH_HARDENED_USERCOPY is unconditional now CONFIG_ARCH_HAS_RAW_COPY_USER is unconditional now m32r: switch to RAW_COPY_USER hexagon: switch to RAW_COPY_USER microblaze: switch to RAW_COPY_USER get rid of padding, switch to RAW_COPY_USER ia64: get rid of copy_in_user() ia64: sanitize __access_ok() ia64: get rid of 'segment' argument of __do_{get,put}_user() ia64: get rid of 'segment' argument of __{get,put}_user_check() ia64: add extable.h powerpc: get rid of zeroing, switch to RAW_COPY_USER esas2r: don't open-code memdup_user() alpha: fix stack smashing in old_adjtimex(2) don't open-code kernel_setsockopt() mips: switch to RAW_COPY_USER mips: get rid of tail-zeroing in primitives mips: make copy_from_user() zero tail explicitly mips: clean and reorder the forest of macros... mips: consolidate __invoke_... wrappers ...
2017-05-01block, dax: use correct format string in bdev_dax_supportedArnd Bergmann
The new message has an incorrect format string, causing a warning in some configurations: fs/block_dev.c: In function 'bdev_dax_supported': fs/block_dev.c:779:5: error: format '%d' expects argument of type 'int', but argument 2 has type 'long int' [-Werror=format=] "error: dax access failed (%d)", len); This changes it to use the correct %ld instead of %d. Fixes: 2093f2e9dfec ("block, dax: convert bdev_dax_supported() to dax_direct_access()") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-05-01Merge branch 'for-4.12/block' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block layer updates from Jens Axboe: - Add BFQ IO scheduler under the new blk-mq scheduling framework. BFQ was initially a fork of CFQ, but subsequently changed to implement fairness based on B-WF2Q+, a modified variant of WF2Q. BFQ is meant to be used on desktop type single drives, providing good fairness. From Paolo. - Add Kyber IO scheduler. This is a full multiqueue aware scheduler, using a scalable token based algorithm that throttles IO based on live completion IO stats, similary to blk-wbt. From Omar. - A series from Jan, moving users to separately allocated backing devices. This continues the work of separating backing device life times, solving various problems with hot removal. - A series of updates for lightnvm, mostly from Javier. Includes a 'pblk' target that exposes an open channel SSD as a physical block device. - A series of fixes and improvements for nbd from Josef. - A series from Omar, removing queue sharing between devices on mostly legacy drivers. This helps us clean up other bits, if we know that a queue only has a single device backing. This has been overdue for more than a decade. - Fixes for the blk-stats, and improvements to unify the stats and user windows. This both improves blk-wbt, and enables other users to register a need to receive IO stats for a device. From Omar. - blk-throttle improvements from Shaohua. This provides a scalable framework for implementing scalable priotization - particularly for blk-mq, but applicable to any type of block device. The interface is marked experimental for now. - Bucketized IO stats for IO polling from Stephen Bates. This improves efficiency of polled workloads in the presence of mixed block size IO. - A few fixes for opal, from Scott. - A few pulls for NVMe, including a lot of fixes for NVMe-over-fabrics. From a variety of folks, mostly Sagi and James Smart. - A series from Bart, improving our exposed info and capabilities from the blk-mq debugfs support. - A series from Christoph, cleaning up how handle WRITE_ZEROES. - A series from Christoph, cleaning up the block layer handling of how we track errors in a request. On top of being a nice cleanup, it also shrinks the size of struct request a bit. - Removal of mg_disk and hd (sorry Linus) by Christoph. The former was never used by platforms, and the latter has outlived it's usefulness. - Various little bug fixes and cleanups from a wide variety of folks. * 'for-4.12/block' of git://git.kernel.dk/linux-block: (329 commits) block: hide badblocks attribute by default blk-mq: unify hctx delay_work and run_work block: add kblock_mod_delayed_work_on() blk-mq: unify hctx delayed_run_work and run_work nbd: fix use after free on module unload MAINTAINERS: bfq: Add Paolo as maintainer for the BFQ I/O scheduler blk-mq-sched: alloate reserved tags out of normal pool mtip32xx: use runtime tag to initialize command header scsi: Implement blk_mq_ops.show_rq() blk-mq: Add blk_mq_ops.show_rq() blk-mq: Show operation, cmd_flags and rq_flags names blk-mq: Make blk_flags_show() callers append a newline character blk-mq: Move the "state" debugfs attribute one level down blk-mq: Unregister debugfs attributes earlier blk-mq: Only unregister hctxs for which registration succeeded blk-mq-debugfs: Rename functions for registering and unregistering the mq directory blk-mq: Let blk_mq_debugfs_register() look up the queue name blk-mq: Register <dev>/queue/mq after having registered <dev>/queue ide-pm: always pass 0 error to ide_complete_rq in ide_do_devset ide-pm: always pass 0 error to __blk_end_request_all ..
2017-04-30ext4: replace BUG_ON with WARN_ONCE in ext4_end_bio()Theodore Ts'o
Add fallback code and a WARN_ONCE() call instead of a BUG_ON() in the ext4_end_bio() function. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-04-30ext4: avoid unnecessary transaction stalls during writebackJan Kara
Currently ext4_writepages() submits all pages with transaction started. When no page needs block allocation or extent conversion we can submit all dirty pages in the inode while holding a single transaction handle and when device is congested this can take significant amount of time. Thus ext4_writepages() can block transaction commits for extended periods of time. Take for example a simple benchmark simulating PostgreSQL database (pgioperf in mmtest). The benchmark runs 16 processes doing random reads from a huge file, one process doing random writes to the huge file, and one process doing sequential writes to a small files and frequently running fsync. With unpatched kernel transaction commits take on average ~18s with standard deviation of ~41s, top 5 commit times are: 274.466639s, 126.467347s, 86.992429s, 34.351563s, 31.517653s. After this patch transaction commits take on average 0.1s with standard deviation of 0.15s, top 5 commit times are: 0.563792s, 0.519980s, 0.509841s, 0.471700s, 0.469899s [ Modified so we use an explicit do_map flag instead of relying on io_end not being allocated, the since io_end->inode is needed for I/O error handling. -- tytso ] Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-04-30fscrypt: Move key structure and constants to uapiJoe Richey
This commit exposes the necessary constants and structures for a userspace program to pass filesystem encryption keys into the keyring. The fscrypt_key structure was already part of the kernel ABI, this change just makes it so programs no longer have to redeclare these structures (like e4crypt in e2fsprogs currently does). Note that we do not expose the other FS_*_KEY_SIZE constants as they are not necessary. Only XTS is supported for contents_encryption_mode, so currently FS_MAX_KEY_SIZE bytes of key material must always be passed to the kernel. This commit also removes __packed from fscrypt_key as it does not contain any implicit padding and does not refer to an on-disk structure. Signed-off-by: Joe Richey <joerichey@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-04-30fscrypt: remove unnecessary checks for NULL operationsEric Biggers
The functions in fs/crypto/*.c are only called by filesystems configured with encryption support. Since the ->get_context(), ->set_context(), and ->empty_dir() operations are always provided in that case (and must be, otherwise there would be no way to get/set encryption policies, or in the case of ->get_context() even access encrypted files at all), there is no need to check for these operations being NULL and we can remove these unneeded checks. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Richard Weinberger <richard@nod.at> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-04-30ext4: preload block group descriptorsAndrew Perepechko
With enabled meta_bg option block group descriptors reading IO is not sequential and requires optimization. Signed-off-by: Andrew Perepechko <andrew.perepechko@seagate.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-04-30ext4: make ext4_shutdown() staticEric Biggers
Make the ext4_shutdown() function static, as suggested by running sparse ('make C=2 fs/ext4/'). This was the only such warning in fs/ext4/. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-04-30ext4: support GETFSMAP ioctlsDarrick J. Wong
Support the GETFSMAP ioctls so that we can use the xfs free space management tools to probe ext4 as well. Note that this is a partial implementation -- we only report fixed-location metadata and free space; everything else is reported as "unknown". Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-04-30ext4: evict inline data when writing to memory mapEric Biggers
Currently the case of writing via mmap to a file with inline data is not handled. This is maybe a rare case since it requires a writable memory map of a very small file, but it is trivial to trigger with on inline_data filesystem, and it causes the 'BUG_ON(ext4_test_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA));' in ext4_writepages() to be hit: mkfs.ext4 -O inline_data /dev/vdb mount /dev/vdb /mnt xfs_io -f /mnt/file \ -c 'pwrite 0 1' \ -c 'mmap -w 0 1m' \ -c 'mwrite 0 1' \ -c 'fsync' kernel BUG at fs/ext4/inode.c:2723! invalid opcode: 0000 [#1] SMP CPU: 1 PID: 2532 Comm: xfs_io Not tainted 4.11.0-rc1-xfstests-00301-g071d9acf3d1f #633 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-20170228_101828-anatol 04/01/2014 task: ffff88003d3a8040 task.stack: ffffc90000300000 RIP: 0010:ext4_writepages+0xc89/0xf8a RSP: 0018:ffffc90000303ca0 EFLAGS: 00010283 RAX: 0000028410000000 RBX: ffff8800383fa3b0 RCX: ffffffff812afcdc RDX: 00000a9d00000246 RSI: ffffffff81e660e0 RDI: 0000000000000246 RBP: ffffc90000303dc0 R08: 0000000000000002 R09: 869618e8f99b4fa5 R10: 00000000852287a2 R11: 00000000a03b49f4 R12: ffff88003808e698 R13: 0000000000000000 R14: 7fffffffffffffff R15: 7fffffffffffffff FS: 00007fd3e53094c0(0000) GS:ffff88003e400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fd3e4c51000 CR3: 000000003d554000 CR4: 00000000003406e0 Call Trace: ? _raw_spin_unlock+0x27/0x2a ? kvm_clock_read+0x1e/0x20 do_writepages+0x23/0x2c ? do_writepages+0x23/0x2c __filemap_fdatawrite_range+0x80/0x87 filemap_write_and_wait_range+0x67/0x8c ext4_sync_file+0x20e/0x472 vfs_fsync_range+0x8e/0x9f ? syscall_trace_enter+0x25b/0x2d0 vfs_fsync+0x1c/0x1e do_fsync+0x31/0x4a SyS_fsync+0x10/0x14 do_syscall_64+0x69/0x131 entry_SYSCALL64_slow_path+0x25/0x25 We could try to be smart and keep the inline data in this case, or at least support delayed allocation when allocating the block, but these solutions would be more complicated and don't seem worthwhile given how rare this case seems to be. So just fix the bug by calling ext4_convert_inline_data() when we're asked to make a page writable, so that any inline data gets evicted, with the block allocated immediately. Reported-by: Nick Alcock <nick.alcock@oracle.com> Cc: stable@vger.kernel.org Reviewed-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-04-30ext4: remove ext4_xattr_check_entry()Eric Biggers
ext4_xattr_check_entry() was redundant with validation of the full xattr entries list in ext4_xattr_check_entries(), which all callers also did. ext4_xattr_check_entry() also didn't actually do correct validation; specifically, it never checked that the value doesn't overlap the xattr names, nor did it account for padding when checking whether the xattr value overflows the available space. So remove it to eliminate any potential confusion. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-04-29ext4: rename ext4_xattr_check_names() to ext4_xattr_check_entries()Eric Biggers
ext4_xattr_check_names() actually validates both the xattr names and values, not just the names. So rename it to ext4_xattr_check_entries() to avoid confusion. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-04-29ext4: merge ext4_xattr_list() into ext4_listxattr()Eric Biggers
There's no difference between ext4_xattr_list() and ext4_listxattr(), so merge them together and just have ext4_listxattr(). Some years ago they took different arguments, but that's no longer the case. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-04-29ext4: constify static data that is never modifiedEric Biggers
Constify static data in ext4 that is never (intentionally) modified so that it is placed in .rodata and benefits from memory protection. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-04-29ext4: trim return value and 'dir' argument from ext4_insert_dentry()Eric Biggers
In the initial implementation of ext4 encryption, the filename was encrypted in ext4_insert_dentry(), which could fail and also required access to the 'dir' inode. Since then ext4 filename encryption has been changed to encrypt the filename earlier, so we can revert the additions to ext4_insert_dentry(). Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-04-29jbd2: fix dbench4 performance regression for 'nobarrier' mountsJan Kara
Commit b685d3d65ac7 "block: treat REQ_FUA and REQ_PREFLUSH as synchronous" removed REQ_SYNC flag from WRITE_FUA implementation. Since JBD2 strips REQ_FUA and REQ_FLUSH flags from submitted IO when the filesystem is mounted with nobarrier mount option, journal superblock writes ended up being async writes after this patch and that caused heavy performance regression for dbench4 benchmark with high number of processes. In my test setup with HP RAID array with non-volatile write cache and 32 GB ram, dbench4 runs with 8 processes regressed by ~25%. Fix the problem by making sure journal superblock writes are always treated as synchronous since they generally block progress of the journalling machinery and thus the whole filesystem. Fixes: b685d3d65ac791406e0dfd8779cc9b3707fea5a3 CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-04-29jbd2: Fix lockdep splat with generic/270 testJan Kara
I've hit a lockdep splat with generic/270 test complaining that: 3216.fsstress.b/3533 is trying to acquire lock: (jbd2_handle){++++..}, at: [<ffffffff813152e0>] jbd2_log_wait_commit+0x0/0x150 but task is already holding lock: (jbd2_handle){++++..}, at: [<ffffffff8130bd3b>] start_this_handle+0x35b/0x850 The underlying problem is that jbd2_journal_force_commit_nested() (called from ext4_should_retry_alloc()) may get called while a transaction handle is started. In such case it takes care to not wait for commit of the running transaction (which would deadlock) but only for a commit of a transaction that is already committing (which is safe as that doesn't wait for any filesystem locks). In fact there are also other callers of jbd2_log_wait_commit() that take care to pass tid of a transaction that is already committing and for those cases, the lockdep instrumentation is too restrictive and leading to false positive reports. Fix the problem by calling jbd2_might_wait_for_commit() from jbd2_log_wait_commit() only if the transaction isn't already committing. Fixes: 1eaa566d368b214d99cbb973647c1b0b8102a9ae Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-04-29fs: compat: Remove warning from COMPATIBLE_IOCTLMark Charlebois
cmd in COMPATIBLE_IOCTL is always a u32, so cast it so there isn't a warning about an overflow in XFORM. From: Mark Charlebois <charlebm@gmail.com> Signed-off-by: Mark Charlebois <charlebm@gmail.com> Signed-off-by: Behan Webster <behanw@converseincode.com> Signed-off-by: Matthias Kaehlcke <mka@chromium.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-04-29remove pointless extern of atime_need_update_rcu()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-04-29pNFS: Ensure we commit the layout if it has been invalidatedTrond Myklebust
If the layout is being invalidated on the server, then we must invoke nfs_commit_inode() to ensure any commits to the DS get cleared out. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-04-29pNFS: Don't send COMMITs to the DSes if the server invalidated our layoutTrond Myklebust
If the layout was invalidated, then assume we should requeue all the pending writes for the DS in question. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-04-29pNFS/flexfiles: Fix up the ff_layout_write_pagelist failure pathTrond Myklebust
If the attempt to write through pNFS fails, we need to use the same failure semantics as for the read path: If the FF_FLAGS_NO_IO_THRU_MDS flag is set or we have sufficient valid DSes, then we must retry through pNFS Fixes: d67ae825a59d ("pnfs/flexfiles: Add the FlexFile Layout Driver") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-04-28proc: Fix unbalanced hard link numbersTakashi Iwai
proc_create_mount_point() forgot to increase the parent's nlink, and it resulted in unbalanced hard link numbers, e.g. /proc/fs shows one less than expected. Fixes: eb6d38d5427b ("proc: Allow creating permanently empty directories...") Cc: stable@vger.kernel.org Reported-by: Tristan Ye <tristan.ye@suse.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2017-04-28Merge branch 'for-linus-4.11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fix from Chris Mason: "We have one more fix for btrfs. This gets rid of a new WARN_ON from rc1 that ended up making more noise than we really want. The larger fix for the underflow got delayed a bit and it's better for now to put it under CONFIG_BTRFS_DEBUG" * 'for-linus-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: btrfs: qgroup: move noisy underflow warning to debugging build
2017-04-28pNFS: Ensure we check layout validity before marking it for returnTrond Myklebust
pnfs_error_mark_layout_for_return needs to check that the layout is valid before calling pnfs_set_plh_return_info(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-04-28NFS4.1 handle interrupted slot reuse from ERR_DELAYOlga Kornievskaia
If the RPC slot was interrupted and server replied to the next operation on the "reused" slot with ERR_DELAY, don't clear out the "interrupted" flag until we properly recover. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-04-28NFSv4: check return value of xdr_inline_decodePan Bian
Function xdr_inline_decode() will return a NULL pointer if the input buffer does not have long enough buffer to decode nbytes of data. However, in function decode_op_map(), the return value of xdr_inline_decode() is not validated before it is used. This patch adds a check to the return value of xdr_inline_decode(). Signed-off-by: Pan Bian <bianpan2016@163.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-04-28nfs/filelayout: fix NULL pointer dereference in fl_pnfs_update_layout()Artem Savkov
Calling pnfs_put_lset on an IS_ERR pointer results in a NULL pointer dereference like the one below. At the same time the check of retvalue of filelayout_check_deviceid() sets lseg to error, but does not free it before that. [ 3000.636161] BUG: unable to handle kernel NULL pointer dereference at 000000000000003c [ 3000.636970] IP: pnfs_put_lseg+0x29/0x100 [nfsv4] [ 3000.637420] PGD 4f23b067 [ 3000.637421] PUD 4a0f4067 [ 3000.637679] PMD 0 [ 3000.637937] [ 3000.638287] Oops: 0000 [#1] SMP [ 3000.638591] Modules linked in: nfs_layout_nfsv41_files nfsv3 nfnetlink_queue nfnetlink_log nfnetlink bluetooth rfkill rpcsec_gss_krb5 nfsv4 nfs fscache binfmt_misc arc4 md4 nls_utf8 cifs ccm dns_resolver rpcrdma ib_isert iscsi_target_mod ib_iser rdma_cm iw_cm libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_ucm ib_uverbs ib_umad ib_cm ib_core nls_koi8_u nls_cp932 ts_kmp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr virtio_balloon ppdev virtio_rng parport_pc i2c_piix4 parport acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c ata_generic pata_acpi virtio_blk virtio_net cirrus drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops crc32c_intel ata_piix ttm libata drm serio_raw [ 3000.645245] i2c_core virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod [last unloaded: xt_u32] [ 3000.646360] CPU: 1 PID: 26402 Comm: date Not tainted 4.11.0-rc7.1.el7.test.x86_64 #1 [ 3000.647092] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 3000.647638] task: ffff8800415ada00 task.stack: ffffc90000ff0000 [ 3000.648207] RIP: 0010:pnfs_put_lseg+0x29/0x100 [nfsv4] [ 3000.648696] RSP: 0018:ffffc90000ff39b8 EFLAGS: 00010246 [ 3000.649193] RAX: 0000000000000000 RBX: fffffffffffffff4 RCX: 00000000000d43be [ 3000.649859] RDX: 00000000000d43bd RSI: 0000000000000000 RDI: fffffffffffffff4 [ 3000.650530] RBP: ffffc90000ff39d8 R08: 000000000001e320 R09: ffffffffa05c35ce [ 3000.651203] R10: ffff88007fd1e320 R11: ffffea0001283d80 R12: 0000000001400040 [ 3000.651875] R13: ffff88004f77d9f0 R14: ffffc90000ff3cd8 R15: ffff8800417ade00 [ 3000.652546] FS: 00007fac4d5cd740(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000 [ 3000.653304] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3000.653849] CR2: 000000000000003c CR3: 000000004f080000 CR4: 00000000000406e0 [ 3000.654527] Call Trace: [ 3000.654771] fl_pnfs_update_layout.constprop.20+0x10c/0x150 [nfs_layout_nfsv41_files] [ 3000.655505] filelayout_pg_init_write+0x21d/0x270 [nfs_layout_nfsv41_files] [ 3000.656195] __nfs_pageio_add_request+0x11c/0x490 [nfs] [ 3000.656698] nfs_pageio_add_request+0xac/0x260 [nfs] [ 3000.657180] nfs_do_writepage+0x109/0x2e0 [nfs] [ 3000.657616] nfs_writepages_callback+0x16/0x30 [nfs] [ 3000.658096] write_cache_pages+0x26f/0x510 [ 3000.658495] ? nfs_do_writepage+0x2e0/0x2e0 [nfs] [ 3000.658946] ? _raw_spin_unlock_bh+0x1e/0x20 [ 3000.659357] ? wb_wakeup_delayed+0x5f/0x70 [ 3000.659748] ? __mark_inode_dirty+0x2eb/0x360 [ 3000.660170] nfs_writepages+0x84/0xd0 [nfs] [ 3000.660575] ? nfs_updatepage+0x571/0xb70 [nfs] [ 3000.661012] do_writepages+0x1e/0x30 [ 3000.661358] __filemap_fdatawrite_range+0xc6/0x100 [ 3000.661819] filemap_write_and_wait_range+0x41/0x90 [ 3000.662292] nfs_file_fsync+0x34/0x1f0 [nfs] [ 3000.662704] vfs_fsync_range+0x3d/0xb0 [ 3000.663065] vfs_fsync+0x1c/0x20 [ 3000.663385] nfs4_file_flush+0x57/0x80 [nfsv4] [ 3000.663813] filp_close+0x2f/0x70 [ 3000.664132] __close_fd+0x9a/0xc0 [ 3000.664453] SyS_close+0x23/0x50 [ 3000.664785] do_syscall_64+0x67/0x180 [ 3000.665162] entry_SYSCALL64_slow_path+0x25/0x25 [ 3000.665600] RIP: 0033:0x7fac4d0e1e90 [ 3000.665946] RSP: 002b:00007ffd54e90c88 EFLAGS: 00000246 ORIG_RAX: 0000000000000003 [ 3000.666679] RAX: ffffffffffffffda RBX: 00007fac4d3b5400 RCX: 00007fac4d0e1e90 [ 3000.667349] RDX: 0000000000000000 RSI: 00007fac4d5d9000 RDI: 0000000000000001 [ 3000.668031] RBP: 0000000000000000 R08: 00007fac4d3b6a00 R09: 00007fac4d5cd740 [ 3000.668709] R10: 00007ffd54e909e0 R11: 0000000000000246 R12: 0000000000000000 [ 3000.669385] R13: 00007fac4d3b5e80 R14: 0000000000000000 R15: 0000000000000000 [ 3000.670061] Code: 00 00 66 66 66 66 90 55 48 85 ff 48 89 e5 41 56 41 55 41 54 53 48 89 fb 0f 84 97 00 00 00 f6 05 16 8f bc ff 10 0f 85 a6 00 00 00 <4c> 8b 63 48 48 8d 7b 38 49 8b 84 24 90 00 00 00 4c 8d a8 88 00 [ 3000.671831] RIP: pnfs_put_lseg+0x29/0x100 [nfsv4] RSP: ffffc90000ff39b8 [ 3000.672462] CR2: 000000000000003c Signed-off-by: Artem Savkov <asavkov@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-04-28xfs: wait on new inodes during quotaoff dquot releaseBrian Foster
The quotaoff operation has a race with inode allocation that results in a livelock. An inode allocation that occurs before the quota status flags are updated acquires the appropriate dquots for the inode via xfs_qm_vop_dqalloc(). It then inserts the XFS_INEW inode into the perag radix tree, sometime later attaches the dquots to the inode and finally clears the XFS_INEW flag. Quotaoff expects to release the dquots from all inodes in the filesystem via xfs_qm_dqrele_all_inodes(). This invokes the AG inode iterator, which skips inodes in the XFS_INEW state because they are not fully constructed. If the scan occurs after dquots have been attached to an inode, but before XFS_INEW is cleared, the newly allocated inode will continue to hold a reference to the applicable dquots. When quotaoff invokes xfs_qm_dqpurge_all(), the reference count of those dquot(s) remain elevated and the dqpurge scan spins indefinitely. To address this problem, update the xfs_qm_dqrele_all_inodes() scan to wait on inodes marked on the XFS_INEW state. We wait on the inodes explicitly rather than skip and retry to avoid continuous retry loops due to a parallel inode allocation workload. Since quotaoff updates the quota state flags and uses a synchronous transaction before the dqrele scan, and dquots are attached to inodes after radix tree insertion iff quota is enabled, one INEW waiting pass through the AG guarantees that the scan has processed all inodes that could possibly hold dquot references. Reported-by: Eryu Guan <eguan@redhat.com> Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-28xfs: update ag iterator to support wait on new inodesBrian Foster
The AG inode iterator currently skips new inodes as such inodes are inserted into the inode radix tree before they are fully constructed. Certain contexts require the ability to wait on the construction of new inodes, however. The fs-wide dquot release from the quotaoff sequence is an example of this. Update the AG inode iterator to support the ability to wait on inodes flagged with XFS_INEW upon request. Create a new xfs_inode_ag_iterator_flags() interface and support a set of iteration flags to modify the iteration behavior. When the XFS_AGITER_INEW_WAIT flag is set, include XFS_INEW flags in the radix tree inode lookup and wait on them before the callback is executed. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-28xfs: support ability to wait on new inodesBrian Foster
Inodes that are inserted into the perag tree but still under construction are flagged with the XFS_INEW bit. Most contexts either skip such inodes when they are encountered or have the ability to handle them. The runtime quotaoff sequence introduces a context that must wait for construction of such inodes to correctly ensure that all dquots in the fs are released. In anticipation of this, support the ability to wait on new inodes. Wake the appropriate bit when XFS_INEW is cleared. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-28xfs: publish UUID in struct super_blockAmir Goldstein
Copy the uuid of the filesystem to struct super_block s_uuid field, as several other filesystems already do. Copy regardless of the nouuid mount option, because other filesystems also do not guaranty uniqueness of the s_uuid field in super_block struct. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-28cifs: don't check for failure from mempool_alloc()NeilBrown
mempool_alloc() cannot fail if the gfp flags allow it to sleep, and both GFP_FS allows for sleeping. So these tests of the return value from mempool_alloc() cannot be needed. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Steve French <smfrench@gmail.com>
2017-04-28Do not return number of bytes written for ioctl CIFS_IOC_COPYCHUNK_FILESachin Prabhu
commit 620d8745b35d ("Introduce cifs_copy_file_range()") changes the behaviour of the cifs ioctl call CIFS_IOC_COPYCHUNK_FILE. In case of successful writes, it now returns the number of bytes written. This return value is treated as an error by the xfstest cifs/001. Depending on the errno set at that time, this may or may not result in the test failing. The patch fixes this by setting the return value to 0 in case of successful writes. Fixes: commit 620d8745b35d ("Introduce cifs_copy_file_range()") Reported-by: Eryu Guan <eguan@redhat.com> Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Acked-by: Pavel Shilovsky <pshilov@microsoft.com> Cc: stable@vger.kernel.org Signed-off-by: Steve French <smfrench@gmail.com>
2017-04-28Fix match_prepath()Sachin Prabhu
Incorrect return value for shares not using the prefix path means that we will never match superblocks for these shares. Fixes: commit c1d8b24d1819 ("Compare prepaths when comparing superblocks") Cc: stable@vger.kernel.org Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <smfrench@gmail.com>
2017-04-27pstore: Solve lockdep warning by moving inode locksKees Cook
Lockdep complains about a possible deadlock between mount and unlink (which is technically impossible), but fixing this improves possible future multiple-backend support, and keeps locking in the right order. The lockdep warning could be triggered by unlinking a file in the pstore filesystem: -> #1 (&sb->s_type->i_mutex_key#14){++++++}: lock_acquire+0xc9/0x220 down_write+0x3f/0x70 pstore_mkfile+0x1f4/0x460 pstore_get_records+0x17a/0x320 pstore_fill_super+0xa4/0xc0 mount_single+0x89/0xb0 pstore_mount+0x13/0x20 mount_fs+0xf/0x90 vfs_kern_mount+0x66/0x170 do_mount+0x190/0xd50 SyS_mount+0x90/0xd0 entry_SYSCALL_64_fastpath+0x1c/0xb1 -> #0 (&psinfo->read_mutex){+.+.+.}: __lock_acquire+0x1ac0/0x1bb0 lock_acquire+0xc9/0x220 __mutex_lock+0x6e/0x990 mutex_lock_nested+0x16/0x20 pstore_unlink+0x3f/0xa0 vfs_unlink+0xb5/0x190 do_unlinkat+0x24c/0x2a0 SyS_unlinkat+0x16/0x30 entry_SYSCALL_64_fastpath+0x1c/0xb1 Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&sb->s_type->i_mutex_key#14); lock(&psinfo->read_mutex); lock(&sb->s_type->i_mutex_key#14); lock(&psinfo->read_mutex); Reported-by: Marta Lofstedt <marta.lofstedt@intel.com> Reported-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Namhyung Kim <namhyung@kernel.org>
2017-04-27NFSv4: Fix callback server shutdownTrond Myklebust
We want to use kthread_stop() in order to ensure the threads are shut down before we tear down the nfs_callback_info in nfs_callback_down. Tested-and-reviewed-by: Kinglong Mee <kinglongmee@gmail.com> Reported-by: Kinglong Mee <kinglongmee@gmail.com> Fixes: bb6aeba736ba9 ("NFSv4.x: Switch to using svc_set_num_threads()...") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-04-27NFSv4.x/callback: Create the callback service through svc_create_pooledKinglong Mee
As the comments for svc_set_num_threads() said, " Destroying threads relies on the service threads filling in rqstp->rq_task, which only the nfs ones do. Assumes the serv has been created using svc_create_pooled()." If creating service through svc_create(), the svc_pool_map_put() will be called in svc_destroy(), but the pool map isn't used. So that, the reference of pool map will be drop, the next using of pool map will get a zero npools. [ 137.992130] divide error: 0000 [#1] SMP [ 137.992148] Modules linked in: nfsd(E) nfsv4 nfs fscache fuse tun bridge stp llc ip_set nfnetlink vmw_vsock_vmci_transport vsock snd_seq_midi snd_seq_midi_event vmw_balloon coretemp crct10dif_pclmul crc32_pclmul ppdev ghash_clmulni_intel intel_rapl_perf joydev snd_ens1371 gameport snd_ac97_codec ac97_bus snd_seq snd_pcm snd_rawmidi snd_timer snd_seq_device snd soundcore parport_pc parport nfit acpi_cpufreq tpm_tis tpm_tis_core tpm vmw_vmci i2c_piix4 shpchp auth_rpcgss nfs_acl lockd(E) grace sunrpc(E) xfs libcrc32c vmwgfx drm_kms_helper ttm crc32c_intel drm e1000 mptspi scsi_transport_spi serio_raw mptscsih mptbase ata_generic pata_acpi [last unloaded: nfsd] [ 137.992336] CPU: 0 PID: 4514 Comm: rpc.nfsd Tainted: G E 4.11.0-rc8+ #536 [ 137.992777] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015 [ 137.993757] task: ffff955984101d00 task.stack: ffff9873c2604000 [ 137.994231] RIP: 0010:svc_pool_for_cpu+0x2b/0x80 [sunrpc] [ 137.994768] RSP: 0018:ffff9873c2607c18 EFLAGS: 00010246 [ 137.995227] RAX: 0000000000000000 RBX: ffff95598376f000 RCX: 0000000000000002 [ 137.995673] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9559944aec00 [ 137.996156] RBP: ffff9873c2607c18 R08: ffff9559944aec28 R09: 0000000000000000 [ 137.996609] R10: 0000000001080002 R11: 0000000000000000 R12: ffff95598376f010 [ 137.997063] R13: ffff95598376f018 R14: ffff9559944aec28 R15: ffff9559944aec00 [ 137.997584] FS: 00007f755529eb40(0000) GS:ffff9559bb600000(0000) knlGS:0000000000000000 [ 137.998048] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 137.998548] CR2: 000055f3aecd9660 CR3: 0000000084290000 CR4: 00000000001406f0 [ 137.999052] Call Trace: [ 137.999517] svc_xprt_do_enqueue+0xef/0x260 [sunrpc] [ 138.000028] svc_xprt_received+0x47/0x90 [sunrpc] [ 138.000487] svc_add_new_perm_xprt+0x76/0x90 [sunrpc] [ 138.000981] svc_addsock+0x14b/0x200 [sunrpc] [ 138.001424] ? recalc_sigpending+0x1b/0x50 [ 138.001860] ? __getnstimeofday64+0x41/0xd0 [ 138.002346] ? do_gettimeofday+0x29/0x90 [ 138.002779] write_ports+0x255/0x2c0 [nfsd] [ 138.003202] ? _copy_from_user+0x4e/0x80 [ 138.003676] ? write_recoverydir+0x100/0x100 [nfsd] [ 138.004098] nfsctl_transaction_write+0x48/0x80 [nfsd] [ 138.004544] __vfs_write+0x37/0x160 [ 138.004982] ? selinux_file_permission+0xd7/0x110 [ 138.005401] ? security_file_permission+0x3b/0xc0 [ 138.005865] vfs_write+0xb5/0x1a0 [ 138.006267] SyS_write+0x55/0xc0 [ 138.006654] entry_SYSCALL_64_fastpath+0x1a/0xa9 [ 138.007071] RIP: 0033:0x7f7554b9dc30 [ 138.007437] RSP: 002b:00007ffc9f92c788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 138.007807] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f7554b9dc30 [ 138.008168] RDX: 0000000000000002 RSI: 00005640cd536640 RDI: 0000000000000003 [ 138.008573] RBP: 00007ffc9f92c780 R08: 0000000000000001 R09: 0000000000000002 [ 138.008918] R10: 0000000000000064 R11: 0000000000000246 R12: 0000000000000004 [ 138.009254] R13: 00005640cdbf77a0 R14: 00005640cdbf7720 R15: 00007ffc9f92c238 [ 138.009610] Code: 0f 1f 44 00 00 48 8b 87 98 00 00 00 55 48 89 e5 48 83 78 08 00 74 10 8b 05 07 42 02 00 83 f8 01 74 40 83 f8 02 74 19 31 c0 31 d2 <f7> b7 88 00 00 00 5d 89 d0 48 c1 e0 07 48 03 87 90 00 00 00 c3 [ 138.010664] RIP: svc_pool_for_cpu+0x2b/0x80 [sunrpc] RSP: ffff9873c2607c18 [ 138.011061] ---[ end trace b3468224cafa7d11 ]--- Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>