summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2018-12-03pstore/ram: Simplify ramoops_get_next_prz() argumentsJoel Fernandes (Google)
(1) remove type argument from ramoops_get_next_prz() Since we store the type of the prz when we initialize it, we no longer need to pass it again in ramoops_get_next_prz() since we can just use that to setup the pstore record. So lets remove it from the argument list. (2) remove max argument from ramoops_get_next_prz() Looking at the code flow, the 'max' checks are already being done on the prz passed to ramoops_get_next_prz(). Lets remove it to simplify this function and reduce its arguments. (3) further reduce ramoops_get_next_prz() arguments by passing record Both the id and type fields of a pstore_record are set by ramoops_get_next_prz(). So we can just pass a pointer to the pstore_record instead of passing individual elements. This results in cleaner more readable code and fewer lines. In addition lets also remove the 'update' argument since we can detect that. Changes are squashed into a single patch to reduce fixup conflicts. Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Kees Cook <keescook@chromium.org>
2018-12-03pstore: Map PSTORE_TYPE_* to stringsJoel Fernandes (Google)
In later patches we will need to map types to names, so create a constant table for that which can also be used in different parts of old and new code. This saves the type in the PRZ which will be useful in later patches. Instead of having an explicit PSTORE_TYPE_UNKNOWN, just use ..._MAX. This includes removing the now redundant filename templates which can use a single format string. Also, there's no reason to limit the "is it still compressed?" test to only PSTORE_TYPE_DMESG when building the pstorefs filename. Records are zero-initialized, so a backend would need to have explicitly set compressed=1. Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Co-developed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org>
2018-12-03pstore: Improve and update some comments and status outputKees Cook
This improves and updates some comments: - dump handler comment out of sync from calling convention - fix kern-doc typo and improves status output: - reminder that only kernel crash dumps are compressed - do not be silent about ECC infrastructure failures Signed-off-by: Kees Cook <keescook@chromium.org>
2018-12-03pstore/ram: Add kern-doc for struct persistent_ram_zoneKees Cook
The struct persistent_ram_zone wasn't well documented. This adds kern-doc for it. Signed-off-by: Kees Cook <keescook@chromium.org>
2018-12-03pstore/ram: Report backend assignments with finer granularityKees Cook
In order to more easily perform automated regression testing, this adds pr_debug() calls to report each prz allocation which can then be verified against persistent storage. Specifically, seeing the dividing line between header, data, any ECC bytes. (And the general assignment output is updated to remove the bogus ECC blocksize which isn't actually recorded outside the prz instance.) Signed-off-by: Kees Cook <keescook@chromium.org>
2018-12-03pstore/ram: Standardize module name in ramoopsKees Cook
With both ram.c and ram_core.c built into ramoops.ko, it doesn't make sense to have differing pr_fmt prefixes. This fixes ram_core.c to use the module name (as ram.c already does). Additionally improves region reservation error to include the region name. Signed-off-by: Kees Cook <keescook@chromium.org>
2018-12-03pstore: Avoid duplicate call of persistent_ram_zap()Peng Wang
When initialing a prz, if invalid data is found (no PERSISTENT_RAM_SIG), the function call path looks like this: ramoops_init_prz -> persistent_ram_new -> persistent_ram_post_init -> persistent_ram_zap persistent_ram_zap As we can see, persistent_ram_zap() is called twice. We can avoid this by adding an option to persistent_ram_new(), and only call persistent_ram_zap() when it is needed. Signed-off-by: Peng Wang <wangpeng15@xiaomi.com> [kees: minor tweak to exit path and commit log] Signed-off-by: Kees Cook <keescook@chromium.org>
2018-12-03pstore: Remove needless lock during console writesKees Cook
Since the console writer does not use the preallocated crash dump buffer any more, there is no reason to perform locking around it. Fixes: 70ad35db3321 ("pstore: Convert console write to use ->write_buf") Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
2018-12-03pstore: Do not use crash buffer for decompressionKees Cook
The pre-allocated compression buffer used for crash dumping was also being used for decompression. This isn't technically safe, since it's possible the kernel may attempt a crashdump while pstore is populating the pstore filesystem (and performing decompression). Instead, just allocate a separate buffer for decompression. Correctness is preferred over performance here. Signed-off-by: Kees Cook <keescook@chromium.org>
2018-12-03Merge branch 'for-linus/pstore' into for-next/pstoreKees Cook
2018-12-03dlm: fix invalid cluster name warningDavid Teigland
The warning added in commit 3b0e761ba83 "dlm: print log message when cluster name is not set" did not account for the fact that lockspaces created from userland do not supply a cluster name, so bogus warnings are printed every time a userland lockspace is created. Signed-off-by: David Teigland <teigland@redhat.com>
2018-12-03sysfs: constify sysfs create/remove files harderJani Nikula
Let the passed in array be const (and thus placed in rodata) instead of a mutable array of const pointers. Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181004143750.30880-1-jani.nikula@intel.com
2018-12-03dlm: NULL check before some freeing functions is not neededThomas Meyer
NULL check before some freeing functions is not needed. Signed-off-by: Thomas Meyer <thomas@m3y3r.de> Signed-off-by: David Teigland <teigland@redhat.com>
2018-12-03fuse: fix revalidation of attributes for permission checkMiklos Szeredi
fuse_invalidate_attr() now sets fi->inval_mask instead of fi->i_time, hence we need to check the inval mask in fuse_permission() as well. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 2f1e81965fd0 ("fuse: allow fine grained attr cache invaldation")
2018-12-03fuse: fix fsync on directoryMiklos Szeredi
Commit ab2257e9941b ("fuse: reduce size of struct fuse_inode") moved parts of fields related to writeback on regular file and to directory caching into a union. However fuse_fsync_common() called from fuse_dir_fsync() touches some writeback related fields, resulting in a crash. Move writeback related parts from fuse_fsync_common() to fuse_fysnc(). Reported-by: Brett Girton <btgirton@gmail.com> Tested-by: Brett Girton <btgirton@gmail.com> Fixes: ab2257e9941b ("fuse: reduce size of struct fuse_inode") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-12-03Merge 4.20-rc5 into driver-core-nextGreg Kroah-Hartman
We need the fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-02nfs: don't dirty kernel pages read by direct-ioDave Kleikamp
When we use direct_IO with an NFS backing store, we can trigger a WARNING in __set_page_dirty(), as below, since we're dirtying the page unnecessarily in nfs_direct_read_completion(). To fix, replicate the logic in commit 53cbf3b157a0 ("fs: direct-io: don't dirtying pages for ITER_BVEC/ITER_KVEC direct read"). Other filesystems that implement direct_IO handle this; most use blockdev_direct_IO(). ceph and cifs have similar logic. mount 127.0.0.1:/export /nfs dd if=/dev/zero of=/nfs/image bs=1M count=200 losetup --direct-io=on -f /nfs/image mkfs.btrfs /dev/loop0 mount -t btrfs /dev/loop0 /mnt/ kernel: WARNING: CPU: 0 PID: 8067 at fs/buffer.c:580 __set_page_dirty+0xaf/0xd0 kernel: Modules linked in: loop(E) nfsv3(E) rpcsec_gss_krb5(E) nfsv4(E) dns_resolver(E) nfs(E) fscache(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) fuse(E) tun(E) ip6t_rpfilter(E) ipt_REJECT(E) nf_ kernel: snd_seq(E) snd_seq_device(E) snd_pcm(E) video(E) snd_timer(E) snd(E) soundcore(E) ip_tables(E) xfs(E) libcrc32c(E) sd_mod(E) sr_mod(E) cdrom(E) ata_generic(E) pata_acpi(E) crc32c_intel(E) ahci(E) li kernel: CPU: 0 PID: 8067 Comm: kworker/0:2 Tainted: G E 4.20.0-rc1.master.20181111.ol7.x86_64 #1 kernel: Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 kernel: Workqueue: nfsiod rpc_async_release [sunrpc] kernel: RIP: 0010:__set_page_dirty+0xaf/0xd0 kernel: Code: c3 48 8b 02 f6 c4 04 74 d4 48 89 df e8 ba 05 f7 ff 48 89 c6 eb cb 48 8b 43 08 a8 01 75 1f 48 89 d8 48 8b 00 a8 04 74 02 eb 87 <0f> 0b eb 83 48 83 e8 01 eb 9f 48 83 ea 01 0f 1f 00 eb 8b 48 83 e8 kernel: RSP: 0000:ffffc1c8825b7d78 EFLAGS: 00013046 kernel: RAX: 000fffffc0020089 RBX: fffff2b603308b80 RCX: 0000000000000001 kernel: RDX: 0000000000000001 RSI: ffff9d11478115c8 RDI: ffff9d11478115d0 kernel: RBP: ffffc1c8825b7da0 R08: 0000646f6973666e R09: 8080808080808080 kernel: R10: 0000000000000001 R11: 0000000000000000 R12: ffff9d11478115d0 kernel: R13: ffff9d11478115c8 R14: 0000000000003246 R15: 0000000000000001 kernel: FS: 0000000000000000(0000) GS:ffff9d115ba00000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: 00007f408686f640 CR3: 0000000104d8e004 CR4: 00000000000606f0 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 kernel: Call Trace: kernel: __set_page_dirty_buffers+0xb6/0x110 kernel: set_page_dirty+0x52/0xb0 kernel: nfs_direct_read_completion+0xc4/0x120 [nfs] kernel: nfs_pgio_release+0x10/0x20 [nfs] kernel: rpc_free_task+0x30/0x70 [sunrpc] kernel: rpc_async_release+0x12/0x20 [sunrpc] kernel: process_one_work+0x174/0x390 kernel: worker_thread+0x4f/0x3e0 kernel: kthread+0x102/0x140 kernel: ? drain_workqueue+0x130/0x130 kernel: ? kthread_stop+0x110/0x110 kernel: ret_from_fork+0x35/0x40 kernel: ---[ end trace 01341980905412c9 ]--- Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> [forward-ported to v4.20] Signed-off-by: Calum Mackay <calum.mackay@oracle.com> Reviewed-by: Dave Kleikamp <dave.kleikamp@oracle.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-12-02flexfiles: enforce per-mirror stateid only for v4 DSesTigran Mkrtchyan
Since commit bb21ce0ad227 we always enforce per-mirror stateid. However, this makes sense only for v4+ servers. Signed-off-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-12-02jffs2: Fix use of uninitialized delayed_work, lockdep breakageDaniel Santos
jffs2_sync_fs makes the assumption that if CONFIG_JFFS2_FS_WRITEBUFFER is defined then a write buffer is available and has been initialized. However, this does is not the case when the mtd device has no out-of-band buffer: int jffs2_nand_flash_setup(struct jffs2_sb_info *c) { if (!c->mtd->oobsize) return 0; ... The resulting call to cancel_delayed_work_sync passing a uninitialized (but zeroed) delayed_work struct forces lockdep to become disabled. [ 90.050639] overlayfs: upper fs does not support tmpfile. [ 90.652264] INFO: trying to register non-static key. [ 90.662171] the code is fine but needs lockdep annotation. [ 90.673090] turning off the locking correctness validator. [ 90.684021] CPU: 0 PID: 1762 Comm: mount_root Not tainted 4.14.63 #0 [ 90.696672] Stack : 00000000 00000000 80d8f6a2 00000038 805f0000 80444600 8fe364f4 805dfbe7 [ 90.713349] 80563a30 000006e2 8068370c 00000001 00000000 00000001 8e2fdc48 ffffffff [ 90.730020] 00000000 00000000 80d90000 00000000 00000106 00000000 6465746e 312e3420 [ 90.746690] 6b636f6c 03bf0000 f8000000 20676e69 00000000 80000000 00000000 8e2c2a90 [ 90.763362] 80d90000 00000001 00000000 8e2c2a90 00000003 80260dc0 08052098 80680000 [ 90.780033] ... [ 90.784902] Call Trace: [ 90.789793] [<8000f0d8>] show_stack+0xb8/0x148 [ 90.798659] [<8005a000>] register_lock_class+0x270/0x55c [ 90.809247] [<8005cb64>] __lock_acquire+0x13c/0xf7c [ 90.818964] [<8005e314>] lock_acquire+0x194/0x1dc [ 90.828345] [<8003f27c>] flush_work+0x200/0x24c [ 90.837374] [<80041dfc>] __cancel_work_timer+0x158/0x210 [ 90.847958] [<801a8770>] jffs2_sync_fs+0x20/0x54 [ 90.857173] [<80125cf4>] iterate_supers+0xf4/0x120 [ 90.866729] [<80158fc4>] sys_sync+0x44/0x9c [ 90.875067] [<80014424>] syscall_common+0x34/0x58 Signed-off-by: Daniel Santos <daniel.santos@pobox.com> Reviewed-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
2018-12-01Merge branches 'bug.2018.11.12a', 'consolidate.2018.12.01a', ↵Paul E. McKenney
'doc.2018.11.12a', 'fixes.2018.11.12a', 'initrd.2018.11.08b', 'sil.2018.11.12a' and 'srcu.2018.11.27a' into HEAD bug.2018.11.12a: Get rid of BUG_ON() and friends consolidate.2018.12.01a: Continued RCU flavor-consolidation cleanup doc.2018.11.12a: Documentation updates fixes.2018.11.12a: Miscellaneous fixes initrd.2018.11.08b: Automate creation of rcutorture initrd sil.2018.11.12a: Remove more spin_unlock_wait() calls
2018-12-01Merge tag 'for-linus-20181201' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block layer fixes from Jens Axboe: - Single range elevator discard merge fix, that caused crashes (Ming) - Fix for a regression in O_DIRECT, where we could potentially lose the error value (Maximilian Heyne) - NVMe pull request from Christoph, with little fixes all over the map for NVMe. * tag 'for-linus-20181201' of git://git.kernel.dk/linux-block: block: fix single range discard merge nvme-rdma: fix double freeing of async event data nvme: flush namespace scanning work just before removing namespaces nvme: warn when finding multi-port subsystems without multipathing enabled fs: fix lost error code in dio_complete nvme-pci: fix surprise removal nvme-fc: initialize nvme_req(rq)->ctrl after calling __nvme_fc_init_request() nvme: Free ctrl device name on init failure
2018-11-30Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc fixes from Andrew Morton: "31 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (31 commits) ocfs2: fix potential use after free mm/khugepaged: fix the xas_create_range() error path mm/khugepaged: collapse_shmem() do not crash on Compound mm/khugepaged: collapse_shmem() without freezing new_page mm/khugepaged: minor reorderings in collapse_shmem() mm/khugepaged: collapse_shmem() remember to clear holes mm/khugepaged: fix crashes due to misaccounted holes mm/khugepaged: collapse_shmem() stop if punched or truncated mm/huge_memory: fix lockdep complaint on 32-bit i_size_read() mm/huge_memory: splitting set mapping+index before unfreeze mm/huge_memory: rename freeze_page() to unmap_page() initramfs: clean old path before creating a hardlink kernel/kcov.c: mark funcs in __sanitizer_cov_trace_pc() as notrace psi: make disabling/enabling easier for vendor kernels proc: fixup map_files test on arm debugobjects: avoid recursive calls with kmemleak userfaultfd: shmem: UFFDIO_COPY: set the page dirty if VM_WRITE is not set userfaultfd: shmem: add i_size checks userfaultfd: shmem/hugetlbfs: only allow to register VM_MAYWRITE vmas userfaultfd: shmem: allocate anonymous memory for MAP_PRIVATE shmem ...
2018-11-30Merge tag 'fscache-fixes-20181130' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull fscache and cachefiles fixes from David Howells: "Misc fixes: - Fix an assertion failure at fs/cachefiles/xattr.c:138 caused by a race between a cache object lookup failing and someone attempting to reenable that object, thereby triggering an update of the object's attributes. - Fix an assertion failure at fs/fscache/operation.c:449 caused by a split atomic subtract and atomic read that allows a race to happen. - Fix a leak of backing pages when simultaneously reading the same page from the same object from two or more threads. - Fix a hang due to a race between a cache object being discarded and the corresponding cookie being reenabled. There are also some minor cleanups: - Cast an enum value to a different enum type to prevent clang from generating a warning. This shouldn't cause any sort of change in the emitted code. - Use ktime_get_real_seconds() instead of get_seconds(). This is just used to uniquify a filename for an object to be placed in the graveyard. Objects placed there are deleted by cachfilesd in userspace immediately thereafter. - Remove an initialised, but otherwise unused variable. This should have been entirely optimised away anyway" * tag 'fscache-fixes-20181130' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: fscache, cachefiles: remove redundant variable 'cache' cachefiles: avoid deprecated get_seconds() cachefiles: Explicitly cast enumerated type in put_object fscache: fix race between enablement and dropping of object cachefiles: Fix page leak in cachefiles_read_backing_file while vmscan is active fscache: Fix race in fscache_op_complete() due to split atomic_sub & read cachefiles: Fix an assertion failure when trying to update a failed object
2018-11-30ocfs2: fix potential use after freePan Bian
ocfs2_get_dentry() calls iput(inode) to drop the reference count of inode, and if the reference count hits 0, inode is freed. However, in this function, it then reads inode->i_generation, which may result in a use after free bug. Move the put operation later. Link: http://lkml.kernel.org/r/1543109237-110227-1-git-send-email-bianpan2016@163.com Fixes: 781f200cb7a("ocfs2: Remove masklog ML_EXPORT.") Signed-off-by: Pan Bian <bianpan2016@163.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Changwei Ge <ge.changwei@h3c.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-30userfaultfd: shmem/hugetlbfs: only allow to register VM_MAYWRITE vmasAndrea Arcangeli
After the VMA to register the uffd onto is found, check that it has VM_MAYWRITE set before allowing registration. This way we inherit all common code checks before allowing to fill file holes in shmem and hugetlbfs with UFFDIO_COPY. The userfaultfd memory model is not applicable for readonly files unless it's a MAP_PRIVATE. Link: http://lkml.kernel.org/r/20181126173452.26955-4-aarcange@redhat.com Fixes: ff62a3421044 ("hugetlb: implement memfd sealing") Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Hugh Dickins <hughd@google.com> Reported-by: Jann Horn <jannh@google.com> Fixes: 4c27fe4c4c84 ("userfaultfd: shmem: add shmem_mcopy_atomic_pte for userfaultfd support") Cc: <stable@vger.kernel.org> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Peter Xu <peterx@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-30hfsplus: do not free node before usingPan Bian
hfs_bmap_free() frees node via hfs_bnode_put(node). However it then reads node->this when dumping error message on an error path, which may result in a use-after-free bug. This patch frees node only when it is never used. Link: http://lkml.kernel.org/r/1543053441-66942-1-git-send-email-bianpan2016@163.com Signed-off-by: Pan Bian <bianpan2016@163.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Ernesto A. Fernandez <ernesto.mnd.fernandez@gmail.com> Cc: Joe Perches <joe@perches.com> Cc: Viacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-30hfs: do not free node before usingPan Bian
hfs_bmap_free() frees the node via hfs_bnode_put(node). However, it then reads node->this when dumping error message on an error path, which may result in a use-after-free bug. This patch frees the node only when it is never again used. Link: http://lkml.kernel.org/r/1542963889-128825-1-git-send-email-bianpan2016@163.com Fixes: a1185ffa2fc ("HFS rewrite") Signed-off-by: Pan Bian <bianpan2016@163.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Joe Perches <joe@perches.com> Cc: Ernesto A. Fernandez <ernesto.mnd.fernandez@gmail.com> Cc: Viacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-30ocfs2: fix deadlock caused by ocfs2_defrag_extent()Larry Chen
ocfs2_defrag_extent may fall into deadlock. ocfs2_ioctl_move_extents ocfs2_ioctl_move_extents ocfs2_move_extents ocfs2_defrag_extent ocfs2_lock_allocators_move_extents ocfs2_reserve_clusters inode_lock GLOBAL_BITMAP_SYSTEM_INODE __ocfs2_flush_truncate_log inode_lock GLOBAL_BITMAP_SYSTEM_INODE As backtrace shows above, ocfs2_reserve_clusters() will call inode_lock against the global bitmap if local allocator has not sufficient cluters. Once global bitmap could meet the demand, ocfs2_reserve_cluster will return success with global bitmap locked. After ocfs2_reserve_cluster(), if truncate log is full, __ocfs2_flush_truncate_log() will definitely fall into deadlock because it needs to inode_lock global bitmap, which has already been locked. To fix this bug, we could remove from ocfs2_lock_allocators_move_extents() the code which intends to lock global allocator, and put the removed code after __ocfs2_flush_truncate_log(). ocfs2_lock_allocators_move_extents() is referred by 2 places, one is here, the other does not need the data allocator context, which means this patch does not affect the caller so far. Link: http://lkml.kernel.org/r/20181101071422.14470-1-lchen@suse.com Signed-off-by: Larry Chen <lchen@suse.com> Reviewed-by: Changwei Ge <ge.changwei@h3c.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-30Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds
Pull vfs fixes from Al Viro: "Assorted fixes all over the place. The iov_iter one is this cycle regression (splice from UDP triggering WARN_ON()), the rest is older" * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: afs: Use d_instantiate() rather than d_add() and don't d_drop() afs: Fix missing net error handling afs: Fix validation/callback interaction iov_iter: teach csum_and_copy_to_iter() to handle pipe-backed ones exportfs: do not read dentry after free exportfs: fix 'passing zero to ERR_PTR()' warning aio: fix failure to put the file pointer sysv: return 'err' instead of 0 in __sysv_write_inode
2018-11-30Merge tag 'pstore-v4.20-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull pstore fix from Kees Cook: "Fix corrupted compression due to unlucky size choice with ECC" * tag 'pstore-v4.20-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: pstore/ram: Correctly calculate usable PRZ bytes
2018-11-30fs/locks: allow a lock request to block other requests.NeilBrown
Currently, a lock can block pending requests, but all pending requests are equal. If lots of pending requests are mutually exclusive, this means they will all be woken up and all but one will fail. This can hurt performance. So we will allow pending requests to block other requests. Only the first request will be woken, and it will wake the others. This patch doesn't implement this fully, but prepares the way. - It acknowledges that a request might be blocking other requests, and when the request is converted to a lock, those blocked requests are moved across. - When a request is requeued or discarded, all blocked requests are woken. - When deadlock-detection looks for the lock which blocks a given request, we follow the chain of ->fl_blocker all the way to the top. Tested-by: kernel test robot <rong.a.chen@intel.com> Signed-off-by: NeilBrown <neilb@suse.com> Reviewed-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org>
2018-11-30fs/locks: use properly initialized file_lock when unlocking.NeilBrown
Both locks_remove_posix() and locks_remove_flock() use a struct file_lock without calling locks_init_lock() on it. This means the various list_heads are not initialized, which will become a problem with a later patch. So change them both to initialize properly. For flock locks, this involves using flock_make_lock(), and changing it to allow a file_lock to be passed in, so memory allocation isn't always needed. Signed-off-by: NeilBrown <neilb@suse.com> Reviewed-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org>
2018-11-30ocfs2: properly initial file_lock used for unlock.NeilBrown
Rather than assuming all-zeros is sufficient, use the available API to initialize the file_lock structure use for unlock. VFS-level changes will soon make it important that the list_heads in file_lock are always properly initialized. Signed-off-by: NeilBrown <neilb@suse.com> Reviewed-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org>
2018-11-30gfs2: properly initial file_lock used for unlock.NeilBrown
Rather than assuming all-zeros is sufficient, use the available API to initialize the file_lock structure use for unlock. VFS-level changes will soon make it important that the list_heads in file_lock are always properly initialized. Signed-off-by: NeilBrown <neilb@suse.com> Reviewed-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org>
2018-11-30NFS: use locks_copy_lock() to copy locks.NeilBrown
Using memcpy() to copy lock requests leaves the various list_head in an inconsistent state. As we will soon attach a list of conflicting request to another pending request, we need these lists to be consistent. So change NFS to use locks_init_lock/locks_copy_lock instead of memcpy. Signed-off-by: NeilBrown <neilb@suse.com> Reviewed-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org>
2018-11-30fs/locks: split out __locks_wake_up_blocks().NeilBrown
This functionality will be useful in future patches, so split it out from locks_wake_up_blocks(). Signed-off-by: NeilBrown <neilb@suse.com> Reviewed-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org>
2018-11-30fs/locks: rename some lists and pointers.NeilBrown
struct file lock contains an 'fl_next' pointer which is used to point to the lock that this request is blocked waiting for. So rename it to fl_blocker. The fl_blocked list_head in an active lock is the head of a list of blocked requests. In a request it is a node in that list. These are two distinct uses, so replace with two list_heads with different names. fl_blocked_requests is the head of a list of blocked requests fl_blocked_member is a node in a member of that list. The two different list_heads are never used at the same time, but that will change in a future patch. Note that a tracepoint is changed to report fl_blocker instead of fl_next. Signed-off-by: NeilBrown <neilb@suse.com> Reviewed-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org>
2018-11-30fscache, cachefiles: remove redundant variable 'cache'Colin Ian King
Variable 'cache' is being assigned but is never used hence it is redundant and can be removed. Cleans up clang warning: warning: variable 'cache' set but not used [-Wunused-but-set-variable] Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David Howells <dhowells@redhat.com>
2018-11-30cachefiles: avoid deprecated get_seconds()Arnd Bergmann
get_seconds() returns an unsigned long can overflow on some architectures and is deprecated because of that. In cachefs, we cast that number to a a 32-bit integer, which will overflow in year 2106 on all architectures. As confirmed by David Howells, the overflow probably isn't harmful in the end, since the timestamps are only used to make the file names unique, but they don't strictly have to be in monotonically increasing order since the files only exist in order to be deleted as quickly as possible. Moving to ktime_get_real_seconds() avoids the deprecated interface. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David Howells <dhowells@redhat.com>
2018-11-30cachefiles: Explicitly cast enumerated type in put_objectNathan Chancellor
Clang warns when one enumerated type is implicitly converted to another. fs/cachefiles/namei.c:247:50: warning: implicit conversion from enumeration type 'enum cachefiles_obj_ref_trace' to different enumeration type 'enum fscache_obj_ref_trace' [-Wenum-conversion] cache->cache.ops->put_object(&xobject->fscache, cachefiles_obj_put_wait_retry); Silence this warning by explicitly casting to fscache_obj_ref_trace, which is also done in put_object. Reported-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com>
2018-11-30fscache: fix race between enablement and dropping of objectNeilBrown
It was observed that a process blocked indefintely in __fscache_read_or_alloc_page(), waiting for FSCACHE_COOKIE_LOOKING_UP to be cleared via fscache_wait_for_deferred_lookup(). At this time, ->backing_objects was empty, which would normaly prevent __fscache_read_or_alloc_page() from getting to the point of waiting. This implies that ->backing_objects was cleared *after* __fscache_read_or_alloc_page was was entered. When an object is "killed" and then "dropped", FSCACHE_COOKIE_LOOKING_UP is cleared in fscache_lookup_failure(), then KILL_OBJECT and DROP_OBJECT are "called" and only in DROP_OBJECT is ->backing_objects cleared. This leaves a window where something else can set FSCACHE_COOKIE_LOOKING_UP and __fscache_read_or_alloc_page() can start waiting, before ->backing_objects is cleared There is some uncertainty in this analysis, but it seems to be fit the observations. Adding the wake in this patch will be handled correctly by __fscache_read_or_alloc_page(), as it checks if ->backing_objects is empty again, after waiting. Customer which reported the hang, also report that the hang cannot be reproduced with this fix. The backtrace for the blocked process looked like: PID: 29360 TASK: ffff881ff2ac0f80 CPU: 3 COMMAND: "zsh" #0 [ffff881ff43efbf8] schedule at ffffffff815e56f1 #1 [ffff881ff43efc58] bit_wait at ffffffff815e64ed #2 [ffff881ff43efc68] __wait_on_bit at ffffffff815e61b8 #3 [ffff881ff43efca0] out_of_line_wait_on_bit at ffffffff815e625e #4 [ffff881ff43efd08] fscache_wait_for_deferred_lookup at ffffffffa04f2e8f [fscache] #5 [ffff881ff43efd18] __fscache_read_or_alloc_page at ffffffffa04f2ffe [fscache] #6 [ffff881ff43efd58] __nfs_readpage_from_fscache at ffffffffa0679668 [nfs] #7 [ffff881ff43efd78] nfs_readpage at ffffffffa067092b [nfs] #8 [ffff881ff43efda0] generic_file_read_iter at ffffffff81187a73 #9 [ffff881ff43efe50] nfs_file_read at ffffffffa066544b [nfs] #10 [ffff881ff43efe70] __vfs_read at ffffffff811fc756 #11 [ffff881ff43efee8] vfs_read at ffffffff811fccfa #12 [ffff881ff43eff18] sys_read at ffffffff811fda62 #13 [ffff881ff43eff50] entry_SYSCALL_64_fastpath at ffffffff815e986e Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: David Howells <dhowells@redhat.com>
2018-11-30fs: fix lost error code in dio_completeMaximilian Heyne
commit e259221763a40403d5bb232209998e8c45804ab8 ("fs: simplify the generic_write_sync prototype") reworked callers of generic_write_sync(), and ended up dropping the error return for the directio path. Prior to that commit, in dio_complete(), an error would be bubbled up the stack, but after that commit, errors passed on to dio_complete were eaten up. This was reported on the list earlier, and a fix was proposed in https://lore.kernel.org/lkml/20160921141539.GA17898@infradead.org/, but never followed up with. We recently hit this bug in our testing where fencing io errors, which were previously erroring out with EIO, were being returned as success operations after this commit. The fix proposed on the list earlier was a little short -- it would have still called generic_write_sync() in case `ret` already contained an error. This fix ensures generic_write_sync() is only called when there's no pending error in the write. Additionally, transferred is replaced with ret to bring this code in line with other callers. Fixes: e259221763a4 ("fs: simplify the generic_write_sync prototype") Reported-by: Ravi Nankani <rnankani@amazon.com> Signed-off-by: Maximilian Heyne <mheyne@amazon.de> Reviewed-by: Christoph Hellwig <hch@lst.de> CC: Torsten Mehlan <tomeh@amazon.de> CC: Uwe Dannowski <uwed@amazon.de> CC: Amit Shah <aams@amazon.de> CC: David Woodhouse <dwmw@amazon.co.uk> CC: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-30block: avoid extra bio reference for async O_DIRECTChristoph Hellwig
The bio referencing has a trick that doesn't do any actual atomic inc/dec on the reference count until we have to elevator to > 1. For the async IO O_DIRECT case, we can't use the simple DIO variants, so we use __blkdev_direct_IO(). It always grabs an extra reference to the bio after allocation, which means we then enter the slower path of actually having to do atomic_inc/dec on the count. We don't need to do that for the async case, unless we end up going multi-bio, in which case we're already doing huge amounts of IO. For the smaller IO case (< BIO_MAX_PAGES), we can do without the extra ref. Based on an earlier patch (and commit log) from Jens Axboe. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-29afs: Use d_instantiate() rather than d_add() and don't d_drop()David Howells
Use d_instantiate() rather than d_add() and don't d_drop() in afs_vnode_new_inode(). The dentry shouldn't be removed as it's not changing its name. Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-11-29afs: Fix missing net error handlingDavid Howells
kAFS can be given certain network errors (EADDRNOTAVAIL, EHOSTDOWN and ERFKILL) that it doesn't handle in its server/address rotation algorithms. They cause the probing and rotation to abort immediately rather than rotating. Fix this by: (1) Abstracting out the error prioritisation from the VL and FS rotation algorithms into a common function and expand usage into the server probing code. When multiple errors are available, this code selects the one we'd prefer to return. (2) Add handling for EADDRNOTAVAIL, EHOSTDOWN and ERFKILL. Fixes: 0fafdc9f888b ("afs: Fix file locking") Fixes: 0338747d8454 ("afs: Probe multiple fileservers simultaneously") Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-11-29afs: Fix validation/callback interactionDavid Howells
When afs_validate() is called to validate a vnode (inode), there are two unhandled cases in the fastpath at the top of the function: (1) If the vnode is promised (AFS_VNODE_CB_PROMISED is set), the break counters match and the data has expired, then there's an implicit case in which the vnode needs revalidating. This has no consequences since the default "valid = false" set at the top of the function happens to do the right thing. (2) If the vnode is not promised and it hasn't been deleted (AFS_VNODE_DELETED is not set) then there's a default case we're not handling in which the vnode is invalid. If the vnode is invalid, we need to bring cb_s_break and cb_v_break up to date before we refetch the status. As a consequence, once the server loses track of the client (ie. sufficient time has passed since we last sent it an operation), it will send us a CB.InitCallBackState* operation when we next try to talk to it. This calls afs_init_callback_state() which increments afs_server::cb_s_break, but this then doesn't propagate to the afs_vnode record. The result being that every afs_validate() call thereafter sends a status fetch operation to the server. Clarify and fix this by: (A) Setting valid in all the branches rather than initialising it at the top so that the compiler catches where we've missed. (B) Restructuring the logic in the 'promised' branch so that we set valid to false if the callback is due to expire (or has expired) and so that the final case is that the vnode is still valid. (C) Adding an else-statement that ups cb_s_break and cb_v_break if the promised and deleted cases don't match. Fixes: c435ee34551e ("afs: Overhaul the callback handling") Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-11-29VFS: use synchronize_rcu_expedited() in namespace_unlock()NeilBrown
The synchronize_rcu() in namespace_unlock() is called every time a filesystem is unmounted. If a great many filesystems are mounted, this can cause a noticable slow-down in, for example, system shutdown. The sequence: mkdir -p /tmp/Mtest/{0..5000} time for i in /tmp/Mtest/*; do mount -t tmpfs tmpfs $i ; done time umount /tmp/Mtest/* on a 4-cpu VM can report 8 seconds to mount the tmpfs filesystems, and 100 seconds to unmount them. Boot the same VM with 1 CPU and it takes 18 seconds to mount the tmpfs filesystems, but only 36 to unmount. If we change the synchronize_rcu() to synchronize_rcu_expedited() the umount time on a 4-cpu VM drop to 0.6 seconds I think this 200-fold speed up is worth the slightly high system impact of using synchronize_rcu_expedited(). Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> (from general rcu perspective) Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-11-29pstore/ram: Correctly calculate usable PRZ bytesKees Cook
The actual number of bytes stored in a PRZ is smaller than the bytes requested by platform data, since there is a header on each PRZ. Additionally, if ECC is enabled, there are trailing bytes used as well. Normally this mismatch doesn't matter since PRZs are circular buffers and the leading "overflow" bytes are just thrown away. However, in the case of a compressed record, this rather badly corrupts the results. This corruption was visible with "ramoops.mem_size=204800 ramoops.ecc=1". Any stored crashes would not be uncompressable (producing a pstorefs "dmesg-*.enc.z" file), and triggering errors at boot: [ 2.790759] pstore: crypto_comp_decompress failed, ret = -22! Backporting this depends on commit 70ad35db3321 ("pstore: Convert console write to use ->write_buf") Reported-by: Joel Fernandes <joel@joelfernandes.org> Fixes: b0aad7a99c1d ("pstore: Add compression support to pstore") Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
2018-11-29Merge tag 'fixes_for_v4.20-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull ext2 and udf fixes from Jan Kara: "Three small ext2 and udf fixes" * tag 'fixes_for_v4.20-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: ext2: fix potential use after free ext2: initialize opts.s_mount_opt as zero before using it udf: Allow mounting volumes with incorrect identification strings
2018-11-28nfsd: clean up indentation, increase indentation in switch statementColin Ian King
Trivial fix to clean up indentation, add in missing tabs. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>