summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2021-12-07cifs: Fix crash on unload of cifs_arc4.koVincent Whitchurch
The exit function is wrongly placed in the __init section and this leads to a crash when the module is unloaded. Just remove both the init and exit functions since this module does not need them. Fixes: 71c02863246167b3d ("cifs: fork arc4 and create a separate module...") Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com> Acked-by: Ronnie Sahlberg <lsahlber@redhat.com> Acked-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Cc: stable@vger.kernel.org # 5.15 Signed-off-by: Steve French <stfrench@microsoft.com>
2021-12-08erofs: Replace zero-length array with flexible-array memberGao Xiang
There is a regular need in the kernel to provide a way to declare having a dynamically sized set of trailing elements in a structure. Kernel code should always use `flexible array members' [1] for these cases. The older style of one-element or zero-length arrays should no longer be used [2]. [1] https://en.wikipedia.org/wiki/Flexible_array_member [2] https://www.kernel.org/doc/html/v5.15/process/deprecated.html#zero-length-and-one-element-arrays Link: https://lore.kernel.org/r/20211206121702.221331-1-hsiangkao@linux.alibaba.com Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2021-12-08erofs: add sysfs node to control sync decompression strategyHuang Jianan
Although readpage is a synchronous path, there will be no additional kworker scheduling overhead in non-atomic contexts together with dm-verity. Let's add a sysfs node to disable sync decompression as an option. Link: https://lore.kernel.org/r/20211206143552.8384-1-huangjianan@oppo.com Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Huang Jianan <huangjianan@oppo.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2021-12-08erofs: add sysfs interfaceHuang Jianan
Add sysfs interface to configure erofs related parameters later. Link: https://lore.kernel.org/r/20211201145436.4357-1-huangjianan@oppo.com Reviewed-by: Chao Yu <chao@kernel.org> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Huang Jianan <huangjianan@oppo.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2021-12-07io_uring: split io_req_complete_post() and add a helperHao Xu
Split io_req_complete_post(), this is a prep for the next patch. Signed-off-by: Hao Xu <haoxu@linux.alibaba.com> Link: https://lore.kernel.org/r/20211207093951.247840-5-haoxu@linux.alibaba.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-12-07io_uring: add helper for task work execution codeHao Xu
Add a helper for task work execution code. We will use it later. Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Hao Xu <haoxu@linux.alibaba.com> Link: https://lore.kernel.org/r/20211207093951.247840-4-haoxu@linux.alibaba.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-12-07io_uring: add a priority tw list for irq completion workHao Xu
Now we have a lot of task_work users, some are just to complete a req and generate a cqe. Let's put the work to a new tw list which has a higher priority, so that it can be handled quickly and thus to reduce avg req latency and users can issue next round of sqes earlier. An explanatory case: origin timeline: submit_sqe-->irq-->add completion task_work -->run heavy work0~n-->run completion task_work now timeline: submit_sqe-->irq-->add completion task_work -->run completion task_work-->run heavy work0~n Limitation: this optimization is only for those that submission and reaping process are in different threads. Otherwise anyhow we have to submit new sqes after returning to userspace, then the order of TWs doesn't matter. Tested this patch(and the following ones) by manually replace __io_queue_sqe() in io_queue_sqe() by io_req_task_queue() to construct 'heavy' task works. Then test with fio: ioengine=io_uring sqpoll=1 thread=1 bs=4k direct=1 rw=randread time_based=1 runtime=600 randrepeat=0 group_reporting=1 filename=/dev/nvme0n1 Tried various iodepth. The peak IOPS for this patch is 710K, while the old one is 665K. For avg latency, difference shows when iodepth grow: depth and avg latency(usec): depth new old 1 7.05 7.10 2 8.47 8.60 4 10.42 10.42 8 13.78 13.22 16 27.41 24.33 32 49.40 53.08 64 102.53 103.36 128 196.98 205.61 256 372.99 414.88 512 747.23 791.30 1024 1472.59 1538.72 2048 3153.49 3329.01 4096 6387.86 6682.54 8192 12150.25 12774.14 16384 23085.58 26044.71 Signed-off-by: Hao Xu <haoxu@linux.alibaba.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/20211207093951.247840-3-haoxu@linux.alibaba.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-12-07io-wq: add helper to merge two wq_listsHao Xu
add a helper to merge two wq_lists, it will be useful in the next patches. Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Hao Xu <haoxu@linux.alibaba.com> Link: https://lore.kernel.org/r/20211207093951.247840-2-haoxu@linux.alibaba.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-12-07fs: dlm: memory cache for lowcomms hotpathAlexander Aring
This patch introduces a kmem cache for dlm_msg handles which are used always if dlm sends a message out. Even if their are covered by midcomms layer or not. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-12-07fs: dlm: memory cache for writequeue_entryAlexander Aring
This patch introduces a kmem cache for writequeue entry. A writequeue entry get quite a lot allocated if dlm transmit messages. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-12-07fs: dlm: memory cache for midcomms hotpathAlexander Aring
This patch will introduce a kmem cache for allocating message handles which are needed for midcomms layer to take track of lowcomms messages. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-12-07fs: dlm: remove wq_alloc mutexAlexander Aring
This patch cleanups the code for allocating a new buffer in the dlm writequeue mechanism. There was a possible tuneup to allow scheduling while a new writequeue entry needs to be allocated because either no sending page is available or are full. To avoid multiple concurrent users checking at the same time if an entry is available or full alloc_wq was introduce that those are waiting if there is currently a new writequeue entry in process to be queued so possible further users will check on the new allocated writequeue entry if it's full. To simplify the code we just remove this mutex and switch that the already introduced spin lock will be held during writequeue check, allocation and queueing. So other users can never check on available writequeues while there is a new one in process but not queued yet. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-12-07fs: dlm: use event based wait for pending removeAlexander Aring
This patch will use an event based waitqueue to wait for a possible clash with the ls_remove_name field of dlm_ls instead of doing busy waiting. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-12-07fs: dlm: check for pending users filling buffersAlexander Aring
Currently we don't care if the DLM application stack is filling buffers (not committed yet) while we transmit some already committed buffers. By checking on active writequeue users before dequeue a writequeue entry we know there is coming more data and do nothing. We wait until the send worker will be triggered again if the writequeue entry users hit zero. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-12-07fs: dlm: use list_empty() to check last iterationAlexander Aring
This patch will use list_empty(&ls->ls_cb_delay) to check for last list iteration. In case of a multiply count of MAX_CB_QUEUE and the list is empty we do a extra goto more which we can avoid by checking on list_empty(). Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-12-07xfs: remove all COW fork extents when remounting readonlyDarrick J. Wong
As part of multiple customer escalations due to file data corruption after copy on write operations, I wrote some fstests that use fsstress to hammer on COW to shake things loose. Regrettably, I caught some filesystem shutdowns due to incorrect rmap operations with the following loop: mount <filesystem> # (0) fsstress <run only readonly ops> & # (1) while true; do fsstress <run all ops> mount -o remount,ro # (2) fsstress <run only readonly ops> mount -o remount,rw # (3) done When (2) happens, notice that (1) is still running. xfs_remount_ro will call xfs_blockgc_stop to walk the inode cache to free all the COW extents, but the blockgc mechanism races with (1)'s reader threads to take IOLOCKs and loses, which means that it doesn't clean them all out. Call such a file (A). When (3) happens, xfs_remount_rw calls xfs_reflink_recover_cow, which walks the ondisk refcount btree and frees any COW extent that it finds. This function does not check the inode cache, which means that incore COW forks of inode (A) is now inconsistent with the ondisk metadata. If one of those former COW extents are allocated and mapped into another file (B) and someone triggers a COW to the stale reservation in (A), A's dirty data will be written into (B) and once that's done, those blocks will be transferred to (A)'s data fork without bumping the refcount. The results are catastrophic -- file (B) and the refcount btree are now corrupt. Solve this race by forcing the xfs_blockgc_free_space to run synchronously, which causes xfs_icwalk to return to inodes that were skipped because the blockgc code couldn't take the IOLOCK. This is safe to do here because the VFS has already prohibited new writer threads. Fixes: 10ddf64e420f ("xfs: remove leftover CoW reservations when remounting ro") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
2021-12-07netfs: fix parameter of cleanup()Jeffle Xu
The order of these two parameters is just reversed. gcc didn't warn on that, probably because 'void *' can be converted from or to other pointer types without warning. Cc: stable@vger.kernel.org Fixes: 3d3c95046742 ("netfs: Provide readahead and readpage netfs helpers") Fixes: e1b1240c1ff5 ("netfs: Add write_begin helper") Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Link: https://lore.kernel.org/r/20211207031449.100510-1-jefflexu@linux.alibaba.com/ # v1
2021-12-07fuse: Pass correct lend value to filemap_write_and_wait_range()Xie Yongji
The acceptable maximum value of lend parameter in filemap_write_and_wait_range() is LLONG_MAX rather than -1. And there is also some logic depending on LLONG_MAX check in write_cache_pages(). So let's pass LLONG_MAX to filemap_write_and_wait_range() in fuse_writeback_range() instead. Fixes: 59bda8ecee2f ("fuse: flush extending writes") Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Cc: <stable@vger.kernel.org> # v5.15 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-12-07netfs: Fix lockdep warning from taking sb_writers whilst holding mmap_lockDavid Howells
Taking sb_writers whilst holding mmap_lock isn't allowed and will result in a lockdep warning like that below. The problem comes from cachefiles needing to take the sb_writers lock in order to do a write to the cache, but being asked to do this by netfslib called from readpage, readahead or write_begin[1]. Fix this by always offloading the write to the cache off to a worker thread. The main thread doesn't need to wait for it, so deadlock can be avoided. This can be tested by running the quick xfstests on something like afs or ceph with lockdep enabled. WARNING: possible circular locking dependency detected 5.15.0-rc1-build2+ #292 Not tainted ------------------------------------------------------ holetest/65517 is trying to acquire lock: ffff88810c81d730 (mapping.invalidate_lock#3){.+.+}-{3:3}, at: filemap_fault+0x276/0x7a5 but task is already holding lock: ffff8881595b53e8 (&mm->mmap_lock#2){++++}-{3:3}, at: do_user_addr_fault+0x28d/0x59c which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (&mm->mmap_lock#2){++++}-{3:3}: validate_chain+0x3c4/0x4a8 __lock_acquire+0x89d/0x949 lock_acquire+0x2dc/0x34b __might_fault+0x87/0xb1 strncpy_from_user+0x25/0x18c removexattr+0x7c/0xe5 __do_sys_fremovexattr+0x73/0x96 do_syscall_64+0x67/0x7a entry_SYSCALL_64_after_hwframe+0x44/0xae -> #1 (sb_writers#10){.+.+}-{0:0}: validate_chain+0x3c4/0x4a8 __lock_acquire+0x89d/0x949 lock_acquire+0x2dc/0x34b cachefiles_write+0x2b3/0x4bb netfs_rreq_do_write_to_cache+0x3b5/0x432 netfs_readpage+0x2de/0x39d filemap_read_page+0x51/0x94 filemap_get_pages+0x26f/0x413 filemap_read+0x182/0x427 new_sync_read+0xf0/0x161 vfs_read+0x118/0x16e ksys_read+0xb8/0x12e do_syscall_64+0x67/0x7a entry_SYSCALL_64_after_hwframe+0x44/0xae -> #0 (mapping.invalidate_lock#3){.+.+}-{3:3}: check_noncircular+0xe4/0x129 check_prev_add+0x16b/0x3a4 validate_chain+0x3c4/0x4a8 __lock_acquire+0x89d/0x949 lock_acquire+0x2dc/0x34b down_read+0x40/0x4a filemap_fault+0x276/0x7a5 __do_fault+0x96/0xbf do_fault+0x262/0x35a __handle_mm_fault+0x171/0x1b5 handle_mm_fault+0x12a/0x233 do_user_addr_fault+0x3d2/0x59c exc_page_fault+0x85/0xa5 asm_exc_page_fault+0x1e/0x30 other info that might help us debug this: Chain exists of: mapping.invalidate_lock#3 --> sb_writers#10 --> &mm->mmap_lock#2 Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&mm->mmap_lock#2); lock(sb_writers#10); lock(&mm->mmap_lock#2); lock(mapping.invalidate_lock#3); *** DEADLOCK *** 1 lock held by holetest/65517: #0: ffff8881595b53e8 (&mm->mmap_lock#2){++++}-{3:3}, at: do_user_addr_fault+0x28d/0x59c stack backtrace: CPU: 0 PID: 65517 Comm: holetest Not tainted 5.15.0-rc1-build2+ #292 Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014 Call Trace: dump_stack_lvl+0x45/0x59 check_noncircular+0xe4/0x129 ? print_circular_bug+0x207/0x207 ? validate_chain+0x461/0x4a8 ? add_chain_block+0x88/0xd9 ? hlist_add_head_rcu+0x49/0x53 check_prev_add+0x16b/0x3a4 validate_chain+0x3c4/0x4a8 ? check_prev_add+0x3a4/0x3a4 ? mark_lock+0xa5/0x1c6 __lock_acquire+0x89d/0x949 lock_acquire+0x2dc/0x34b ? filemap_fault+0x276/0x7a5 ? rcu_read_unlock+0x59/0x59 ? add_to_page_cache_lru+0x13c/0x13c ? lock_is_held_type+0x7b/0xd3 down_read+0x40/0x4a ? filemap_fault+0x276/0x7a5 filemap_fault+0x276/0x7a5 ? pagecache_get_page+0x2dd/0x2dd ? __lock_acquire+0x8bc/0x949 ? pte_offset_kernel.isra.0+0x6d/0xc3 __do_fault+0x96/0xbf ? do_fault+0x124/0x35a do_fault+0x262/0x35a ? handle_pte_fault+0x1c1/0x20d __handle_mm_fault+0x171/0x1b5 ? handle_pte_fault+0x20d/0x20d ? __lock_release+0x151/0x254 ? mark_held_locks+0x1f/0x78 ? rcu_read_unlock+0x3a/0x59 handle_mm_fault+0x12a/0x233 do_user_addr_fault+0x3d2/0x59c ? pgtable_bad+0x70/0x70 ? rcu_read_lock_bh_held+0xab/0xab exc_page_fault+0x85/0xa5 ? asm_exc_page_fault+0x8/0x30 asm_exc_page_fault+0x1e/0x30 RIP: 0033:0x40192f Code: ff 48 89 c3 48 8b 05 50 28 00 00 48 85 ed 7e 23 31 d2 4b 8d 0c 2f eb 0a 0f 1f 00 48 8b 05 39 28 00 00 48 0f af c2 48 83 c2 01 <48> 89 1c 01 48 39 d5 7f e8 8b 0d f2 27 00 00 31 c0 85 c9 74 0e 8b RSP: 002b:00007f9931867eb0 EFLAGS: 00010202 RAX: 0000000000000000 RBX: 00007f9931868700 RCX: 00007f993206ac00 RDX: 0000000000000001 RSI: 0000000000000000 RDI: 00007ffc13e06ee0 RBP: 0000000000000100 R08: 0000000000000000 R09: 00007f9931868700 R10: 00007f99318689d0 R11: 0000000000000202 R12: 00007ffc13e06ee0 R13: 0000000000000c00 R14: 00007ffc13e06e00 R15: 00007f993206a000 Fixes: 726218fdc22c ("netfs: Define an interface to talk to a cache") Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Jeff Layton <jlayton@kernel.org> cc: Jan Kara <jack@suse.cz> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20210922110420.GA21576@quack2.suse.cz/ [1] Link: https://lore.kernel.org/r/163887597541.1596626.2668163316598972956.stgit@warthog.procyon.org.uk/ # v1
2021-12-06io-wq: remove spurious bit clear on task_work additionJens Axboe
There's a small race here where the task_work could finish and drop the worker itself, so that by the time that task_work_add() returns with a successful addition we've already put the worker. The worker callbacks clear this bit themselves, so we don't actually need to manually clear it in the caller. Get rid of it. Reported-by: syzbot+b60c982cb0efc5e05a47@syzkaller.appspotmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-12-05io_uring: reuse io_req_task_complete for timeoutsPavel Begunkov
With kbuf unification io_req_task_complete() is now a generic function, use it for timeout's tw completions. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/7142fa3cbaf3a4140d59bcba45cbe168cf40fac2.1638714983.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-12-05io_uring: tweak iopoll CQE_SKIP event countingPavel Begunkov
When iopolling the userspace specifies the minimum number of "events" it expects. Previously, we had one CQE per request, so the definition of an "event" was unequivocal, but that's not more the case anymore with REQ_F_CQE_SKIP. Currently it counts the number of completed requests, replace it with the number of posted CQEs. This allows users of the "one CQE per link" scheme to wait for all N links in a single syscall, which is not possible without the patch and requires extra context switches. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d5a965c4d2249827392037bbd0186f87fea49c55.1638714983.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-12-05io_uring: simplify selected buf handlingPavel Begunkov
As selected buffers are now stored in a separate field in a request, get rid of rw/recv specific helpers and simplify the code. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/bd4a866d8d91b044f748c40efff9e4eacd07536e.1638714983.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-12-05io_uring: move up io_put_kbuf() and io_put_rw_kbuf()Hao Xu
Move them up to avoid explicit declaration. We will use them in later patches. Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Hao Xu <haoxu@linux.alibaba.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/3631243d6fc4a79bbba0cd62597fc8cd5be95924.1638714983.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-12-05fs: support mapped mounts of mapped filesystemsChristian Brauner
In previous patches we added new and modified existing helpers to handle idmapped mounts of filesystems mounted with an idmapping. In this final patch we convert all relevant places in the vfs to actually pass the filesystem's idmapping into these helpers. With this the vfs is in shape to handle idmapped mounts of filesystems mounted with an idmapping. Note that this is just the generic infrastructure. Actually adding support for idmapped mounts to a filesystem mountable with an idmapping is follow-up work. In this patch we extend the definition of an idmapped mount from a mount that that has the initial idmapping attached to it to a mount that has an idmapping attached to it which is not the same as the idmapping the filesystem was mounted with. As before we do not allow the initial idmapping to be attached to a mount. In addition this patch prevents that the idmapping the filesystem was mounted with can be attached to a mount created based on this filesystem. This has multiple reasons and advantages. First, attaching the initial idmapping or the filesystem's idmapping doesn't make much sense as in both cases the values of the i_{g,u}id and other places where k{g,u}ids are used do not change. Second, a user that really wants to do this for whatever reason can just create a separate dedicated identical idmapping to attach to the mount. Third, we can continue to use the initial idmapping as an indicator that a mount is not idmapped allowing us to continue to keep passing the initial idmapping into the mapping helpers to tell them that something isn't an idmapped mount even if the filesystem is mounted with an idmapping. Link: https://lore.kernel.org/r/20211123114227.3124056-11-brauner@kernel.org (v1) Link: https://lore.kernel.org/r/20211130121032.3753852-11-brauner@kernel.org (v2) Link: https://lore.kernel.org/r/20211203111707.3901969-11-brauner@kernel.org Cc: Seth Forshee <sforshee@digitalocean.com> Cc: Amir Goldstein <amir73il@gmail.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> CC: linux-fsdevel@vger.kernel.org Reviewed-by: Seth Forshee <sforshee@digitalocean.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-12-05fs: port higher-level mapping helpersChristian Brauner
Enable the mapped_fs{g,u}id() helpers to support filesystems mounted with an idmapping. Apart from core mapping helpers that use mapped_fs{g,u}id() to initialize struct inode's i_{g,u}id fields xfs is the only place that uses these low-level helpers directly. The patch only extends the helpers to be able to take the filesystem idmapping into account. Since we don't actually yet pass the filesystem's idmapping in no functional changes happen. This will happen in a final patch. Link: https://lore.kernel.org/r/20211123114227.3124056-9-brauner@kernel.org (v1) Link: https://lore.kernel.org/r/20211130121032.3753852-9-brauner@kernel.org (v2) Link: https://lore.kernel.org/r/20211203111707.3901969-9-brauner@kernel.org Cc: Seth Forshee <sforshee@digitalocean.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> CC: linux-fsdevel@vger.kernel.org Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Seth Forshee <sforshee@digitalocean.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-12-04Merge tag 'xfs-5.16-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs fix from Darrick Wong: "Remove an unnecessary (and backwards) rename flags check that duplicates a VFS level check" * tag 'xfs-5.16-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: remove incorrect ASSERT in xfs_rename
2021-12-04Merge tag '5.16-rc3-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cifs fixes from Steve French: "Three SMB3 multichannel/fscache fixes and a DFS fix. In testing multichannel reconnect scenarios recently various problems with the cifs.ko implementation of fscache were found (e.g. incorrect initialization of fscache cookies in some cases)" * tag '5.16-rc3-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: avoid use of dstaddr as key for fscache client cookie cifs: add server conn_id to fscache client cookie cifs: wait for tcon resource_id before getting fscache super cifs: fix missed refcounting of ipc tcon
2021-12-04gfs2: Fix gfs2_instantiate descriptionAndreas Gruenbacher
The description of gfs2_instantiate accidentally lists a glock argument, but the function takes a glock holder. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-12-04gfs2: Remove redundant check for GLF_INSTANTIATE_NEEDEDAndreas Gruenbacher
If the GLF_INSTANTIATE_NEEDED flag isn't set, gfs2_instantiate() is a no-op. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-12-04gfs2: remove redundant set of INSTANTIATE_NEEDEDBob Peterson
Function rgrp_go_inval calls gfs2_rgrp_brelse to invalidate the in-core rgrp structures. After the call it set GLF_INSTANTIATE_NEEDED, which is redundant, since gfs2_rgrp_brelse also sets it. This patch simply removes the redundant set_bit. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-12-04gfs2: Fix __gfs2_holder_init function name in kernel-doc commentAndreas Gruenbacher
The function name in the kernel-doc comment wasn't updated when the function was renamed. Fixes: b016d9a84abd ("gfs2: Save ip from gfs2_glock_nq_init") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-12-04f2fs: implement iomap operationsEric Biggers
Implement 'struct iomap_ops' for f2fs, in preparation for making f2fs use iomap for direct I/O. Note that this may be used for other things besides direct I/O in the future; however, for now I've only tested it for direct I/O. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-12-04f2fs: fix the f2fs_file_write_iter tracepointEric Biggers
Pass in the original position and count rather than the position and count that were updated by the write. Also use the correct types for all arguments, in particular the file offset which was being truncated to 32 bits on 32-bit platforms. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-12-04f2fs: do not expose unwritten blocks to user by DIOJaegeuk Kim
DIO preallocates physical blocks before writing data, but if an error occurrs or power-cut happens, we can see block contents from the disk. This patch tries to fix it by 1) turning to buffered writes for DIO into holes, 2) truncating unwritten blocks from error or power-cut. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-12-04f2fs: reduce indentation in f2fs_file_write_iter()Eric Biggers
Replace 'if (ret > 0)' with 'if (ret <= 0) goto out_unlock;'. No change in behavior. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-12-04fsdax: don't require CONFIG_BLOCKChristoph Hellwig
The file system DAX code now does not require the block code. So allow building a kernel with fuse DAX but not block layer. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20211129102203.2243509-30-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-12-04iomap: build the block based code conditionallyChristoph Hellwig
Only build the block based iomap code if CONFIG_BLOCK is set. Currently that is always the case, but it will change soon. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20211129102203.2243509-29-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-12-04fsdax: shift partition offset handling into the file systemsChristoph Hellwig
Remove the last user of ->bdev in dax.c by requiring the file system to pass in an address that already includes the DAX offset. As part of the only set ->bdev or ->daxdev when actually required in the ->iomap_begin methods. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> [erofs] Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20211129102203.2243509-27-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-12-04dax: return the partition offset from fs_dax_get_by_bdevChristoph Hellwig
Prepare for the removal of the block_device from the DAX I/O path by returning the partition offset from fs_dax_get_by_bdev so that the file systems have it at hand for use during I/O. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20211129102203.2243509-26-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-12-04iomap: add a IOMAP_DAX flagChristoph Hellwig
Add a flag so that the file system can easily detect DAX operations based just on the iomap operation requested instead of looking at inode state using IS_DAX. This will be needed to apply the to be added partition offset only for operations that actually use DAX, but not things like fiemap that are based on the block device. In the long run it should also allow turning the bdev, dax_dev and inline_data into a union. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20211129102203.2243509-25-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-12-04xfs: pass the mapping flags to xfs_bmbt_to_iomapChristoph Hellwig
To prepare for looking at the IOMAP_DAX flag in xfs_bmbt_to_iomap pass in the input mapping flags to xfs_bmbt_to_iomap. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20211129102203.2243509-24-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-12-04xfs: use xfs_direct_write_iomap_ops for DAX zeroingChristoph Hellwig
While the buffered write iomap ops do work due to the fact that zeroing never allocates blocks, the DAX zeroing should use the direct ops just like actual DAX I/O. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20211129102203.2243509-23-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-12-04xfs: move dax device handling into xfs_{alloc,free}_buftargChristoph Hellwig
Hide the DAX device lookup from the xfs_super.c code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Link: https://lore.kernel.org/r/20211129102203.2243509-22-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-12-04ext4: cleanup the dax handling in ext4_fill_superChristoph Hellwig
Only call fs_dax_get_by_bdev once the sbi has been allocated and remove the need for the dax_dev local variable. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20211129102203.2243509-21-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-12-04ext2: cleanup the dax handling in ext2_fill_superChristoph Hellwig
Only call fs_dax_get_by_bdev once the sbi has been allocated and remove the need for the dax_dev local variable. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20211129102203.2243509-20-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-12-04fsdax: decouple zeroing from the iomap buffered I/O codeChristoph Hellwig
Unshare the DAX and iomap buffered I/O page zeroing code. This code previously did a IS_DAX check deep inside the iomap code, which in fact was the only DAX check in the code. Instead move these checks into the callers. Most callers already have DAX special casing anyway and XFS will need it for reflink support as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20211129102203.2243509-19-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-12-04fsdax: factor out a dax_memzero helperChristoph Hellwig
Factor out a helper for the "manual" zeroing of a DAX range to clean up dax_iomap_zero a lot. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20211129102203.2243509-18-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-12-04fsdax: simplify the offset check in dax_iomap_zeroChristoph Hellwig
The file relative offset must have the same alignment as the storage offset, so use that and get rid of the call to iomap_sector. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20211129102203.2243509-17-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-12-04xfs: add xfs_zero_range and xfs_truncate_page helpersShiyang Ruan
Add helpers to prepare for using different DAX operations. Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com> [hch: split from a larger patch + slight cleanups] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20211129102203.2243509-16-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>