summaryrefslogtreecommitdiff
path: root/fs/fuse
AgeCommit message (Collapse)Author
2020-08-11Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio updates from Michael Tsirkin: - IRQ bypass support for vdpa and IFC - MLX5 vdpa driver - Endianness fixes for virtio drivers - Misc other fixes * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (71 commits) vdpa/mlx5: fix up endian-ness for mtu vdpa: Fix pointer math bug in vdpasim_get_config() vdpa/mlx5: Fix pointer math in mlx5_vdpa_get_config() vdpa/mlx5: fix memory allocation failure checks vdpa/mlx5: Fix uninitialised variable in core/mr.c vdpa_sim: init iommu lock virtio_config: fix up warnings on parisc vdpa/mlx5: Add VDPA driver for supported mlx5 devices vdpa/mlx5: Add shared memory registration code vdpa/mlx5: Add support library for mlx5 VDPA implementation vdpa/mlx5: Add hardware descriptive header file vdpa: Modify get_vq_state() to return error code net/vdpa: Use struct for set/get vq state vdpa: remove hard coded virtq num vdpasim: support batch updating vhost-vdpa: support IOTLB batching hints vhost-vdpa: support get/set backend features vhost: generialize backend features setting/getting vhost-vdpa: refine ioctl pre-processing vDPA: dont change vq irq after DRIVER_OK ...
2020-08-05virtio_fs: convert to LE accessorsMichael S. Tsirkin
Virtio fs is modern-only. Use LE accessors for config space. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-08-04Merge tag 'uninit-macro-v5.9-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull uninitialized_var() macro removal from Kees Cook: "This is long overdue, and has hidden too many bugs over the years. The series has several "by hand" fixes, and then a trivial treewide replacement. - Clean up non-trivial uses of uninitialized_var() - Update documentation and checkpatch for uninitialized_var() removal - Treewide removal of uninitialized_var()" * tag 'uninit-macro-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: compiler: Remove uninitialized_var() macro treewide: Remove uninitialized_var() usage checkpatch: Remove awareness of uninitialized_var() macro mm/debug_vm_pgtable: Remove uninitialized_var() usage f2fs: Eliminate usage of uninitialized_var() macro media: sur40: Remove uninitialized_var() usage KVM: PPC: Book3S PR: Remove uninitialized_var() usage clk: spear: Remove uninitialized_var() usage clk: st: Remove uninitialized_var() usage spi: davinci: Remove uninitialized_var() usage ide: Remove uninitialized_var() usage rtlwifi: rtl8192cu: Remove uninitialized_var() usage b43: Remove uninitialized_var() usage drbd: Remove uninitialized_var() usage x86/mm/numa: Remove uninitialized_var() usage docs: deprecated.rst: Add uninitialized_var()
2020-07-16treewide: Remove uninitialized_var() usageKees Cook
Using uninitialized_var() is dangerous as it papers over real bugs[1] (or can in the future), and suppresses unrelated compiler warnings (e.g. "unused variable"). If the compiler thinks it is uninitialized, either simply initialize the variable or make compiler changes. In preparation for removing[2] the[3] macro[4], remove all remaining needless uses with the following script: git grep '\buninitialized_var\b' | cut -d: -f1 | sort -u | \ xargs perl -pi -e \ 's/\buninitialized_var\(([^\)]+)\)/\1/g; s:\s*/\* (GCC be quiet|to make compiler happy) \*/$::g;' drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid pathological white-space. No outstanding warnings were found building allmodconfig with GCC 9.3.0 for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64, alpha, and m68k. [1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/ [2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/ [3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/ [4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/ Reviewed-by: Leon Romanovsky <leonro@mellanox.com> # drivers/infiniband and mlx4/mlx5 Acked-by: Jason Gunthorpe <jgg@mellanox.com> # IB Acked-by: Kalle Valo <kvalo@codeaurora.org> # wireless drivers Reviewed-by: Chao Yu <yuchao0@huawei.com> # erofs Signed-off-by: Kees Cook <keescook@chromium.org>
2020-07-15fuse: Fix parameter for FS_IOC_{GET,SET}FLAGSChirantan Ekbote
The ioctl encoding for this parameter is a long but the documentation says it should be an int and the kernel drivers expect it to be an int. If the fuse driver treats this as a long it might end up scribbling over the stack of a userspace process that only allocated enough space for an int. This was previously discussed in [1] and a patch for fuse was proposed in [2]. From what I can tell the patch in [2] was nacked in favor of adding new, "fixed" ioctls and using those from userspace. However there is still no "fixed" version of these ioctls and the fact is that it's sometimes infeasible to change all userspace to use the new one. Handling the ioctls specially in the fuse driver seems like the most pragmatic way for fuse servers to support them without causing crashes in userspace applications that call them. [1]: https://lore.kernel.org/linux-fsdevel/20131126200559.GH20559@hall.aurel32.net/T/ [2]: https://sourceforge.net/p/fuse/mailman/message/31771759/ Signed-off-by: Chirantan Ekbote <chirantan@chromium.org> Fixes: 59efec7b9039 ("fuse: implement ioctl support") Cc: <stable@vger.kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-07-14fuse: don't ignore errors from fuse_writepages_fill()Vasily Averin
fuse_writepages() ignores some errors taken from fuse_writepages_fill() I believe it is a bug: if .writepages is called with WB_SYNC_ALL it should either guarantee that all data was successfully saved or return error. Fixes: 26d614df1da9 ("fuse: Implement writepages callback") Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-07-14fuse: clean up condition for writepage sendingMiklos Szeredi
fuse_writepages_fill uses following construction: if (wpa && ap->num_pages && (A || B || C)) { action; } else if (wpa && D) { if (E) { the same action; } } - ap->num_pages check is always true and can be removed - "if" and "else if" calls the same action and can be merged. Move checking A, B, C, D, E conditions to a helper, add comments. Original-patch-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-07-14fuse: reject options on reconfigure via fsconfig(2)Miklos Szeredi
Previous patch changed handling of remount/reconfigure to ignore all options, including those that are unknown to the fuse kernel fs. This was done for backward compatibility, but this likely only affects the old mount(2) API. The new fsconfig(2) based reconfiguration could possibly be improved. This would make the new API less of a drop in replacement for the old, OTOH this is a good chance to get rid of some weirdnesses in the old API. Several other behaviors might make sense: 1) unknown options are rejected, known options are ignored 2) unknown options are rejected, known options are rejected if the value is changed, allowed otherwise 3) all options are rejected Prior to the backward compatibility fix to ignore all options all known options were accepted (1), even if they change the value of a mount parameter; fuse_reconfigure() does not look at the config values set by fuse_parse_param(). To fix that we'd need to verify that the value provided is the same as set in the initial configuration (2). The major drawback is that this is much more complex than just rejecting all attempts at changing options (3); i.e. all options signify initial configuration values and don't make sense on reconfigure. This patch opts for (3) with the rationale that no mount options are reconfigurable in fuse. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-07-14fuse: ignore 'data' argument of mount(..., MS_REMOUNT)Miklos Szeredi
The command mount -o remount -o unknownoption /mnt/fuse succeeds on kernel versions prior to v5.4 and fails on kernel version at or after. This is because fuse_parse_param() rejects any unrecognised options in case of FS_CONTEXT_FOR_RECONFIGURE, just as for FS_CONTEXT_FOR_MOUNT. This causes a regression in case the fuse filesystem is in fstab, since remount sends all options found there to the kernel; even ones that are meant for the initial mount and are consumed by the userspace fuse server. Fix this by ignoring mount options, just as fuse_remount_fs() did prior to the conversion to the new API. Reported-by: Stefan Priebe <s.priebe@profihost.ag> Fixes: c30da2e981a7 ("fuse: convert to use the new mount API") Cc: <stable@vger.kernel.org> # v5.4 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-07-14fuse: use ->reconfigure() instead of ->remount_fs()Miklos Szeredi
s_op->remount_fs() is only called from legacy_reconfigure(), which is not used after being converted to the new API. Convert to using ->reconfigure(). This restores the previous behavior of syncing the filesystem and rejecting MS_MANDLOCK on remount. Fixes: c30da2e981a7 ("fuse: convert to use the new mount API") Cc: <stable@vger.kernel.org> # v5.4 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-07-14fuse: fix warning in tree_insert() and clean up writepage insertionMiklos Szeredi
fuse_writepages_fill() calls tree_insert() with ap->num_pages = 0 which triggers the following warning: WARNING: CPU: 1 PID: 17211 at fs/fuse/file.c:1728 tree_insert+0xab/0xc0 [fuse] RIP: 0010:tree_insert+0xab/0xc0 [fuse] Call Trace: fuse_writepages_fill+0x5da/0x6a0 [fuse] write_cache_pages+0x171/0x470 fuse_writepages+0x8a/0x100 [fuse] do_writepages+0x43/0xe0 Fix up the warning and clean up the code around rb-tree insertion: - Rename tree_insert() to fuse_insert_writeback() and make it return the conflicting entry in case of failure - Re-add tree_insert() as a wrapper around fuse_insert_writeback() - Rename fuse_writepage_in_flight() to fuse_writepage_add() and reverse the meaning of the return value to mean + "true" in case the writepage entry was successfully added + "false" in case it was in-fligt queued on an existing writepage entry's auxiliary list or the existing writepage entry's temporary page updated Switch from fuse_find_writeback() + tree_insert() to fuse_insert_writeback() - Move setting orig_pages to before inserting/updating the entry; this may result in the orig_pages value being discarded later in case of an in-flight request - In case of a new writepage entry use fuse_writepage_add() unconditionally, only set data->wpa if the entry was added. Fixes: 6b2fb79963fb ("fuse: optimize writepages search") Reported-by: kernel test robot <rong.a.chen@intel.com> Original-path-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-07-14fuse: move rb_erase() before tree_insert()Miklos Szeredi
In fuse_writepage_end() the old writepages entry needs to be removed from the rbtree before inserting the new one, otherwise tree_insert() would fail. This is a very rare codepath and no reproducer exists. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-06-09Merge tag 'fuse-update-5.8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse updates from Miklos Szeredi: - Fix a rare deadlock in virtiofs - Fix st_blocks in writeback cache mode - Fix wrong checks in splice move causing spurious warnings - Fix a race between a GETATTR request and a FUSE_NOTIFY_INVAL_INODE notification - Use rb-tree instead of linear search for pages currently under writeout by userspace - Fix copy_file_range() inconsistencies * tag 'fuse-update-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: copy_file_range should truncate cache fuse: fix copy_file_range cache issues fuse: optimize writepages search fuse: update attr_version counter on fuse_notify_inval_inode() fuse: don't check refcount after stealing page fuse: fix weird page warning fuse: use dump_page virtiofs: do not use fuse_fill_super_common() for device installation fuse: always allow query of st_dev fuse: always flush dirty data on close(2) fuse: invalidate inode attr in writeback cache mode fuse: Update stale comment in queue_interrupt() fuse: BUG_ON correction in fuse_dev_splice_write() virtiofs: Add mount option and atime behavior to the doc virtiofs: schedule blocking async replies in separate worker
2020-06-03Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge more updates from Andrew Morton: "More mm/ work, plenty more to come Subsystems affected by this patch series: slub, memcg, gup, kasan, pagealloc, hugetlb, vmscan, tools, mempolicy, memblock, hugetlbfs, thp, mmap, kconfig" * akpm: (131 commits) arm64: mm: use ARCH_HAS_DEBUG_WX instead of arch defined x86: mm: use ARCH_HAS_DEBUG_WX instead of arch defined riscv: support DEBUG_WX mm: add DEBUG_WX support drivers/base/memory.c: cache memory blocks in xarray to accelerate lookup mm/thp: rename pmd_mknotpresent() as pmd_mkinvalid() powerpc/mm: drop platform defined pmd_mknotpresent() mm: thp: don't need to drain lru cache when splitting and mlocking THP hugetlbfs: get unmapped area below TASK_UNMAPPED_BASE for hugetlbfs sparc32: register memory occupied by kernel as memblock.memory include/linux/memblock.h: fix minor typo and unclear comment mm, mempolicy: fix up gup usage in lookup_node tools/vm/page_owner_sort.c: filter out unneeded line mm: swap: memcg: fix memcg stats for huge pages mm: swap: fix vmstats for huge pages mm: vmscan: limit the range of LRU type balancing mm: vmscan: reclaim writepage is IO cost mm: vmscan: determine anon/file pressure balance at the reclaim root mm: balance LRU lists based on relative thrashing mm: only count actual rotations as LRU reclaim cost ...
2020-06-03mm: fold and remove lru_cache_add_anon() and lru_cache_add_file()Johannes Weiner
They're the same function, and for the purpose of all callers they are equivalent to lru_cache_add(). [akpm@linux-foundation.org: fix it for local_lock changes] Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Rik van Riel <riel@surriel.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Link: http://lkml.kernel.org/r/20200520232525.798933-5-hannes@cmpxchg.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-03Merge branch 'work.splice' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull splice updates from Al Viro: "Christoph's assorted splice cleanups" * 'work.splice' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs: rename pipe_buf ->steal to ->try_steal fs: make the pipe_buf_operations ->confirm operation optional fs: make the pipe_buf_operations ->steal operation optional trace: remove tracing_pipe_buf_ops pipe: merge anon_pipe_buf*_ops fs: simplify do_splice_from fs: simplify do_splice_to
2020-06-02Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge updates from Andrew Morton: "A few little subsystems and a start of a lot of MM patches. Subsystems affected by this patch series: squashfs, ocfs2, parisc, vfs. With mm subsystems: slab-generic, slub, debug, pagecache, gup, swap, memcg, pagemap, memory-failure, vmalloc, kasan" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (128 commits) kasan: move kasan_report() into report.c mm/mm_init.c: report kasan-tag information stored in page->flags ubsan: entirely disable alignment checks under UBSAN_TRAP kasan: fix clang compilation warning due to stack protector x86/mm: remove vmalloc faulting mm: remove vmalloc_sync_(un)mappings() x86/mm/32: implement arch_sync_kernel_mappings() x86/mm/64: implement arch_sync_kernel_mappings() mm/ioremap: track which page-table levels were modified mm/vmalloc: track which page-table levels were modified mm: add functions to track page directory modifications s390: use __vmalloc_node in stack_alloc powerpc: use __vmalloc_node in alloc_vm_stack arm64: use __vmalloc_node in arch_alloc_vmap_stack mm: remove vmalloc_user_node_flags mm: switch the test_vmalloc module to use __vmalloc_node mm: remove __vmalloc_node_flags_caller mm: remove both instances of __vmalloc_node_flags mm: remove the prot argument to __vmalloc_node mm: remove the pgprot argument to __vmalloc ...
2020-06-02fuse: convert from readpages to readaheadMatthew Wilcox (Oracle)
Implement the new readahead operation in fuse by using __readahead_batch() to fill the array of pages in fuse_args_pages directly. This lets us inline fuse_readpages_fill() into fuse_readahead(). [willy@infradead.org: build fix] Link: http://lkml.kernel.org/r/20200415025938.GB5820@bombadil.infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Acked-by: Miklos Szeredi <mszeredi@redhat.com> Cc: Chao Yu <yuchao0@huawei.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Eric Biggers <ebiggers@google.com> Cc: Gao Xiang <gaoxiang25@huawei.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: http://lkml.kernel.org/r/20200414150233.24495-25-willy@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-20fs: rename pipe_buf ->steal to ->try_stealChristoph Hellwig
And replace the arcane return value convention with a simple bool where true means success and false means failure. [AV: braino fix folded in] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-05-20fuse: copy_file_range should truncate cacheMiklos Szeredi
After the copy operation completes the cache is not up-to-date. Truncate all pages in the interval that has successfully been copied. Truncating completely copied dirty pages is okay, since the data has been overwritten anyway. Truncating partially copied dirty pages is not okay; add a comment for now. Fixes: 88bc7d5097a1 ("fuse: add support for copy_file_range()") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-05-20fuse: fix copy_file_range cache issuesMiklos Szeredi
a) Dirty cache needs to be written back not just in the writeback_cache case, since the dirty pages may come from memory maps. b) The fuse_writeback_range() helper takes an inclusive interval, so the end position needs to be pos+len-1 instead of pos+len. Fixes: 88bc7d5097a1 ("fuse: add support for copy_file_range()") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-05-19fuse: optimize writepages searchMaxim Patlasov
Re-work fi->writepages, replacing list with rb-tree. This improves performance because kernel fuse iterates through fi->writepages for each writeback page and typical number of entries is about 800 (for 100MB of fuse writeback). Before patch: 10240+0 records in 10240+0 records out 10737418240 bytes (11 GB) copied, 41.3473 s, 260 MB/s 2 1 0 57445400 40416 6323676 0 0 33 374743 8633 19210 1 8 88 3 0 29.86% [kernel] [k] _raw_spin_lock 26.62% [fuse] [k] fuse_page_is_writeback After patch: 10240+0 records in 10240+0 records out 10737418240 bytes (11 GB) copied, 21.4954 s, 500 MB/s 2 9 0 53676040 31744 10265984 0 0 64 854790 10956 48387 1 6 88 6 0 23.55% [kernel] [k] copy_user_enhanced_fast_string 9.87% [kernel] [k] __memcpy 3.10% [kernel] [k] _raw_spin_lock Signed-off-by: Maxim Patlasov <mpatlasov@virtuozzo.com> Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-05-19fuse: update attr_version counter on fuse_notify_inval_inode()Miklos Szeredi
A GETATTR request can race with FUSE_NOTIFY_INVAL_INODE, resulting in the attribute cache being updated with stale information after the invalidation. Fix this by bumping the attribute version in fuse_reverse_inval_inode(). Reported-by: Krzysztof Rusek <rusek@9livesdata.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-05-19fuse: don't check refcount after stealing pageMiklos Szeredi
page_count() is unstable. Unless there has been an RCU grace period between when the page was removed from the page cache and now, a speculative reference may exist from the page cache. Reported-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-05-19fuse: fix weird page warningMiklos Szeredi
When PageWaiters was added, updating this check was missed. Reported-by: Nikolaus Rath <Nikolaus@rath.org> Reported-by: Hugh Dickins <hughd@google.com> Fixes: 62906027091f ("mm: add PageWaiters indicating tasks are waiting for a page bit") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-05-19fuse: use dump_pageMiklos Szeredi
Instead of custom page dumping, use the standard helper. Reported-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-05-19virtiofs: do not use fuse_fill_super_common() for device installationVivek Goyal
fuse_fill_super_common() allocates and installs one fuse_device. Hence virtiofs allocates and install all fuse devices by itself except one. This makes logic little twisted. There does not seem to be any real need that why virtiofs can't allocate and install all fuse devices itself. So opt out of fuse device allocation and installation while calling fuse_fill_super_common(). Regular fuse still wants fuse_fill_super_common() to install fuse_device. It needs to prevent against races where two mounters are trying to mount fuse using same fd. In that case one will succeed while other will get -EINVAL. virtiofs does not have this issue because sget_fc() resolves the race w.r.t multiple mounters and only one instance of virtio_fs_fill_super() should be in progress for same filesystem. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-05-19fuse: always allow query of st_devMiklos Szeredi
Fuse mounts without "allow_other" are off-limits to all non-owners. Yet it makes sense to allow querying st_dev on the root, since this value is provided by the kernel, not the userspace filesystem. Allow statx(2) with a zero request mask to succeed on a fuse mounts for all users. Reported-by: Nikolaus Rath <Nikolaus@rath.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-05-19fuse: always flush dirty data on close(2)Miklos Szeredi
We want cached data to synced with the userspace filesystem on close(), for example to allow getting correct st_blocks value. Do this regardless of whether the userspace filesystem implements a FLUSH method or not. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-05-19fuse: invalidate inode attr in writeback cache modeEryu Guan
Under writeback mode, inode->i_blocks is not updated, making utils du read st.blocks as 0. For example, when using virtiofs (cache=always & nondax mode) with writeback_cache enabled, writing a new file and check its disk usage with du, du reports 0 usage. # uname -r 5.6.0-rc6+ # mount -t virtiofs virtiofs /mnt/virtiofs # rm -f /mnt/virtiofs/testfile # create new file and do extend write # xfs_io -fc "pwrite 0 4k" /mnt/virtiofs/testfile wrote 4096/4096 bytes at offset 0 4 KiB, 1 ops; 0.0001 sec (28.103 MiB/sec and 7194.2446 ops/sec) # du -k /mnt/virtiofs/testfile 0 <==== disk usage is 0 # stat -c %s,%b /mnt/virtiofs/testfile 4096,0 <==== i_size is correct, but st_blocks is 0 Fix it by invalidating attr in fuse_flush(), so we get up-to-date attr from server on next getattr. Signed-off-by: Eryu Guan <eguan@linux.alibaba.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-04-20docs: fix broken references to text filesMauro Carvalho Chehab
Several references got broken due to txt to ReST conversion. Several of them can be automatically fixed with: scripts/documentation-file-ref-check --fix Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> # hwtracing/coresight/Kconfig Reviewed-by: Paul E. McKenney <paulmck@kernel.org> # memory-barrier.txt Acked-by: Alex Shi <alex.shi@linux.alibaba.com> # translations/zh_CN Acked-by: Federico Vaga <federico.vaga@vaga.pv.it> # translations/it_IT Acked-by: Marc Zyngier <maz@kernel.org> # kvm/arm64 Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Link: https://lore.kernel.org/r/6f919ddb83a33b5f2a63b6b5f0575737bb2b36aa.1586881715.git.mchehab+huawei@kernel.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2020-04-20fuse: Update stale comment in queue_interrupt()Kirill Tkhai
Fixes: 04ec5af0776e "fuse: export fuse_end_request()" Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-04-20fuse: BUG_ON correction in fuse_dev_splice_write()Vasily Averin
commit 963545357202 ("fuse: reduce allocation size for splice_write") changed size of bufs array, so BUG_ON which checks the index of the array shold also be fixed. [SzM: turn BUG_ON into WARN_ON] Fixes: 963545357202 ("fuse: reduce allocation size for splice_write") Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-04-20virtiofs: schedule blocking async replies in separate workerVivek Goyal
In virtiofs (unlike in regular fuse) processing of async replies is serialized. This can result in a deadlock in rare corner cases when there's a circular dependency between the completion of two or more async replies. Such a deadlock can be reproduced with xfstests:generic/503 if TEST_DIR == SCRATCH_MNT (which is a misconfiguration): - Process A is waiting for page lock in worker thread context and blocked (virtio_fs_requests_done_work()). - Process B is holding page lock and waiting for pending writes to finish (fuse_wait_on_page_writeback()). - Write requests are waiting in virtqueue and can't complete because worker thread is blocked on page lock (process A). Fix this by creating a unique work_struct for each async reply that can block (O_DIRECT read). Fixes: a62a8ef9d97d ("virtio-fs: add virtiofs filesystem") Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-02-13fuse: fix stack use after returnMiklos Szeredi
Normal, synchronous requests will have their args allocated on the stack. After the FR_FINISHED bit is set by receiving the reply from the userspace fuse server, the originating task may return and reuse the stack frame, resulting in an Oops if the args structure is dereferenced. Fix by setting a flag in the request itself upon initializing, indicating whether it has an asynchronous ->end() callback. Reported-by: Kyle Sanderson <kyle.leet@gmail.com> Reported-by: Michael Stapelberg <michael+lkml@stapelberg.ch> Fixes: 2b319d1f6f92 ("fuse: don't dereference req->args on finished request") Cc: <stable@vger.kernel.org> # v5.4 Tested-by: Michael Stapelberg <michael+lkml@stapelberg.ch> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-02-08Merge branch 'merge.nfs-fs_parse.1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs file system parameter updates from Al Viro: "Saner fs_parser.c guts and data structures. The system-wide registry of syntax types (string/enum/int32/oct32/.../etc.) is gone and so is the horror switch() in fs_parse() that would have to grow another case every time something got added to that system-wide registry. New syntax types can be added by filesystems easily now, and their namespace is that of functions - not of system-wide enum members. IOW, they can be shared or kept private and if some turn out to be widely useful, we can make them common library helpers, etc., without having to do anything whatsoever to fs_parse() itself. And we already get that kind of requests - the thing that finally pushed me into doing that was "oh, and let's add one for timeouts - things like 15s or 2h". If some filesystem really wants that, let them do it. Without somebody having to play gatekeeper for the variants blessed by direct support in fs_parse(), TYVM. Quite a bit of boilerplate is gone. And IMO the data structures make a lot more sense now. -200LoC, while we are at it" * 'merge.nfs-fs_parse.1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (25 commits) tmpfs: switch to use of invalfc() cgroup1: switch to use of errorfc() et.al. procfs: switch to use of invalfc() hugetlbfs: switch to use of invalfc() cramfs: switch to use of errofc() et.al. gfs2: switch to use of errorfc() et.al. fuse: switch to use errorfc() et.al. ceph: use errorfc() and friends instead of spelling the prefix out prefix-handling analogues of errorf() and friends turn fs_param_is_... into functions fs_parse: handle optional arguments sanely fs_parse: fold fs_parameter_desc/fs_parameter_spec fs_parser: remove fs_parameter_description name field add prefix to fs_context->log ceph_parse_param(), ceph_parse_mon_ips(): switch to passing fc_log new primitive: __fs_parse() switch rbd and libceph to p_log-based primitives struct p_log, variants of warnf() et.al. taking that one instead teach logfc() to handle prefices, give it saner calling conventions get rid of cg_invalf() ...
2020-02-07fuse: switch to use errorfc() et.al.Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-02-07fs_parse: fold fs_parameter_desc/fs_parameter_specAl Viro
The former contains nothing but a pointer to an array of the latter... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-02-07fs_parser: remove fs_parameter_description name fieldEric Sandeen
Unused now. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-02-06fuse: use true,false for bool variablezhengbin
Fixes coccicheck warning: fs/fuse/readdir.c:335:1-19: WARNING: Assignment of 0/1 to bool variable fs/fuse/file.c:1398:2-19: WARNING: Assignment of 0/1 to bool variable fs/fuse/file.c:1400:2-20: WARNING: Assignment of 0/1 to bool variable fs/fuse/cuse.c:454:1-20: WARNING: Assignment of 0/1 to bool variable fs/fuse/cuse.c:455:1-19: WARNING: Assignment of 0/1 to bool variable fs/fuse/inode.c:497:2-17: WARNING: Assignment of 0/1 to bool variable fs/fuse/inode.c:504:2-23: WARNING: Assignment of 0/1 to bool variable fs/fuse/inode.c:511:2-22: WARNING: Assignment of 0/1 to bool variable fs/fuse/inode.c:518:2-23: WARNING: Assignment of 0/1 to bool variable fs/fuse/inode.c:522:2-26: WARNING: Assignment of 0/1 to bool variable fs/fuse/inode.c:526:2-18: WARNING: Assignment of 0/1 to bool variable fs/fuse/inode.c:1000:1-20: WARNING: Assignment of 0/1 to bool variable Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: zhengbin <zhengbin13@huawei.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-02-06fuse: Support RENAME_WHITEOUT flagVivek Goyal
Allow fuse to pass RENAME_WHITEOUT to fuse server. Overlayfs on top of virtiofs uses RENAME_WHITEOUT. Without this patch renaming a directory in overlayfs (dir is on lower) fails with -EINVAL. With this patch it works. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-02-06fuse: don't overflow LLONG_MAX with end offsetMiklos Szeredi
Handle the special case of fuse_readpages() wanting to read the last page of a hugest file possible and overflowing the end offset in the process. This is basically to unbreak xfstests:generic/525 and prevent filesystems from doing bad things with an overflowing offset. Reported-by: Xiao Yang <ice_yangxiao@163.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-02-06fix up iter on short count in fuse_direct_io()Miklos Szeredi
fuse_direct_io() can end up advancing the iterator by more than the amount of data read or written. This case is handled by the generic code if going through ->direct_IO(), but not in the FOPEN_DIRECT_IO case. Fix by reverting the extra bytes from the iterator in case of error or a short count. To test: install lxcfs, then the following testcase int fd = open("/var/lib/lxcfs/proc/uptime", O_RDONLY); sendfile(1, fd, NULL, 16777216); sendfile(1, fd, NULL, 16777216); will spew WARN_ON() in iov_iter_pipe(). Reported-by: Peter Geis <pgwipeout@gmail.com> Reported-by: Al Viro <viro@zeniv.linux.org.uk> Fixes: 3c3db095b68c ("fuse: use iov_iter based generic splice helpers") Cc: <stable@vger.kernel.org> # v5.1 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-01-16fuse: fix fuse_send_readpages() in the syncronous read caseMiklos Szeredi
Buffered read in fuse normally goes via: -> generic_file_buffered_read() -> fuse_readpages() -> fuse_send_readpages() ->fuse_simple_request() [called since v5.4] In the case of a read request, fuse_simple_request() will return a non-negative bytecount on success or a negative error value. A positive bytecount was taken to be an error and the PG_error flag set on the page. This resulted in generic_file_buffered_read() falling back to ->readpage(), which would repeat the read request and succeed. Because of the repeated read succeeding the bug was not detected with regression tests or other use cases. The FTP module in GVFS however fails the second read due to the non-seekable nature of FTP downloads. Fix by checking and ignoring positive return value from fuse_simple_request(). Reported-by: Ondrej Holy <oholy@redhat.com> Link: https://gitlab.gnome.org/GNOME/gvfs/issues/441 Fixes: 134831e36bbd ("fuse: convert readpages to simple api") Cc: <stable@vger.kernel.org> # v5.4 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-12-06pipe: Fix iteration end check in fuse_dev_splice_write()David Howells
Fix the iteration end check in fuse_dev_splice_write(). The iterator position can only be compared with == or != since wrappage may be involved. Fixes: 8cefc107ca54 ("pipe: Use head and tail pointers for the ring, not cursor and length") Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-12-05Merge tag 'fuse-update-5.5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse update from Miklos Szeredi: - Fix a regression introduced in the last release - Fix a number of issues with validating data coming from userspace - Some cleanups in virtiofs * tag 'fuse-update-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: fix Kconfig indentation fuse: fix leak of fuse_io_priv virtiofs: Use completions while waiting for queue to be drained virtiofs: Do not send forget request "struct list_head" element virtiofs: Use a common function to send forget virtiofs: Fix old-style declaration fuse: verify nlink fuse: verify write return fuse: verify attributes
2019-12-01Merge tag 'compat-ioctl-5.5' of ↵Linus Torvalds
git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground Pull removal of most of fs/compat_ioctl.c from Arnd Bergmann: "As part of the cleanup of some remaining y2038 issues, I came to fs/compat_ioctl.c, which still has a couple of commands that need support for time64_t. In completely unrelated work, I spent time on cleaning up parts of this file in the past, moving things out into drivers instead. After Al Viro reviewed an earlier version of this series and did a lot more of that cleanup, I decided to try to completely eliminate the rest of it and move it all into drivers. This series incorporates some of Al's work and many patches of my own, but in the end stops short of actually removing the last part, which is the scsi ioctl handlers. I have patches for those as well, but they need more testing or possibly a rewrite" * tag 'compat-ioctl-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: (42 commits) scsi: sd: enable compat ioctls for sed-opal pktcdvd: add compat_ioctl handler compat_ioctl: move SG_GET_REQUEST_TABLE handling compat_ioctl: ppp: move simple commands into ppp_generic.c compat_ioctl: handle PPPIOCGIDLE for 64-bit time_t compat_ioctl: move PPPIOCSCOMPRESS to ppp_generic compat_ioctl: unify copy-in of ppp filters tty: handle compat PPP ioctls compat_ioctl: move SIOCOUTQ out of compat_ioctl.c compat_ioctl: handle SIOCOUTQNSD af_unix: add compat_ioctl support compat_ioctl: reimplement SG_IO handling compat_ioctl: move WDIOC handling into wdt drivers fs: compat_ioctl: move FITRIM emulation into file systems gfs2: add compat_ioctl support compat_ioctl: remove unused convert_in_user macro compat_ioctl: remove last RAID handling code compat_ioctl: remove /dev/raw ioctl translation compat_ioctl: remove PCI ioctl translation compat_ioctl: remove joystick ioctl translation ...
2019-11-30Merge tag 'notifications-pipe-prep-20191115' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull pipe rework from David Howells: "This is my set of preparatory patches for building a general notification queue on top of pipes. It makes a number of significant changes: - It removes the nr_exclusive argument from __wake_up_sync_key() as this is always 1. This prepares for the next step: - Adds wake_up_interruptible_sync_poll_locked() so that poll can be woken up from a function that's holding the poll waitqueue spinlock. - Change the pipe buffer ring to be managed in terms of unbounded head and tail indices rather than bounded index and length. This means that reading the pipe only needs to modify one index, not two. - A selection of helper functions are provided to query the state of the pipe buffer, plus a couple to apply updates to the pipe indices. - The pipe ring is allowed to have kernel-reserved slots. This allows many notification messages to be spliced in by the kernel without allowing userspace to pin too many pages if it writes to the same pipe. - Advance the head and tail indices inside the pipe waitqueue lock and use wake_up_interruptible_sync_poll_locked() to poke poll without having to take the lock twice. - Rearrange pipe_write() to preallocate the buffer it is going to write into and then drop the spinlock. This allows kernel notifications to then be added the ring whilst it is filling the buffer it allocated. The read side is stalled because the pipe mutex is still held. - Don't wake up readers on a pipe if there was already data in it when we added more. - Don't wake up writers on a pipe if the ring wasn't full before we removed a buffer" * tag 'notifications-pipe-prep-20191115' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: pipe: Remove sync on wake_ups pipe: Increase the writer-wakeup threshold to reduce context-switch count pipe: Check for ring full inside of the spinlock in pipe_write() pipe: Remove redundant wakeup from pipe_write() pipe: Rearrange sequence in pipe_write() to preallocate slot pipe: Conditionalise wakeup in pipe_read() pipe: Advance tail pointer inside of wait spinlock in pipe_read() pipe: Allow pipes to have kernel-reserved slots pipe: Use head and tail pointers for the ring, not cursor and length Add wake_up_interruptible_sync_poll_locked() Remove the nr_exclusive argument from __wake_up_sync_key() pipe: Reduce #inclusion of pipe_fs_i.h
2019-11-27fuse: fix Kconfig indentationKrzysztof Kozlowski
Adjust indentation from spaces to tab (+optional two spaces) as in coding style with command like: $ sed -e 's/^ /\t/' -i */Kconfig Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-11-27fuse: fix leak of fuse_io_privMiklos Szeredi
exit_aio() is sometimes stuck in wait_for_completion() after aio is issued with direct IO and the task receives a signal. The reason is failure to call ->ki_complete() due to a leaked reference to fuse_io_priv. This happens in fuse_async_req_send() if fuse_simple_background() returns an error (e.g. -EINTR). In this case the error value is propagated via io->err, so return success to not confuse callers. This issue is tracked as a virtio-fs issue: https://gitlab.com/virtio-fs/qemu/issues/14 Reported-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> Fixes: 45ac96ed7c36 ("fuse: convert direct_io to simple api") Cc: <stable@vger.kernel.org> # v5.4 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>