summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2022-05-23Merge tag 'for-5.19/io_uring-passthrough-2022-05-22' of ↵Linus Torvalds
git://git.kernel.dk/linux-block Pull io_uring NVMe command passthrough from Jens Axboe: "On top of everything else, this adds support for passthrough for io_uring. The initial feature for this is NVMe passthrough support, which allows non-filesystem based IO commands and admin commands. To support this, io_uring grows support for SQE and CQE members that are twice as big, allowing to pass in a full NVMe command without having to copy data around. And to complete with more than just a single 32-bit value as the output" * tag 'for-5.19/io_uring-passthrough-2022-05-22' of git://git.kernel.dk/linux-block: (22 commits) io_uring: cleanup handling of the two task_work lists nvme: enable uring-passthrough for admin commands nvme: helper for uring-passthrough checks blk-mq: fix passthrough plugging nvme: add vectored-io support for uring-cmd nvme: wire-up uring-cmd support for io-passthru on char-device. nvme: refactor nvme_submit_user_cmd() block: wire-up support for passthrough plugging fs,io_uring: add infrastructure for uring-cmd io_uring: support CQE32 for nop operation io_uring: enable CQE32 io_uring: support CQE32 in /proc info io_uring: add tracing for additional CQE32 fields io_uring: overflow processing for CQE32 io_uring: flush completions for CQE32 io_uring: modify io_get_cqe for CQE32 io_uring: add CQE32 completion processing io_uring: add CQE32 setup processing io_uring: change ring size calculation for CQE32 io_uring: store add. return values for CQE32 ...
2022-05-23Merge tag 'for-5.19/io_uring-net-2022-05-22' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring 'more data in socket' support from Jens Axboe: "To be able to fully utilize the 'poll first' support in the core io_uring branch, it's advantageous knowing if the socket was empty after a receive. This adds support for that" * tag 'for-5.19/io_uring-net-2022-05-22' of git://git.kernel.dk/linux-block: io_uring: return hint on whether more data is available after receive tcp: pass back data left in socket after receive
2022-05-23Merge tag 'for-5.19/io_uring-socket-2022-05-22' of ↵Linus Torvalds
git://git.kernel.dk/linux-block Pull io_uring socket() support from Jens Axboe: "This adds support for socket(2) for io_uring. This is handy when using direct / registered file descriptors with io_uring. Outside of those two patches, a small series from Dylan on top that improves the tracing by providing a text representation of the opcode rather than needing to decode this by reading the header file every time. That sits in this branch as it was the last opcode added (until it wasn't...)" * tag 'for-5.19/io_uring-socket-2022-05-22' of git://git.kernel.dk/linux-block: io_uring: use the text representation of ops in trace io_uring: rename op -> opcode io_uring: add io_uring_get_opcode io_uring: add type to op enum io_uring: add socket(2) support net: add __sys_socket_file()
2022-05-23Merge tag 'for-5.19/io_uring-xattr-2022-05-22' of ↵Linus Torvalds
git://git.kernel.dk/linux-block Pull io_uring xattr support from Jens Axboe: "Support for the xattr variants" * tag 'for-5.19/io_uring-xattr-2022-05-22' of git://git.kernel.dk/linux-block: io_uring: cleanup error-handling around io_req_complete io_uring: fix trace for reduced sqe padding io_uring: add fgetxattr and getxattr support io_uring: add fsetxattr and setxattr support fs: split off do_getxattr from getxattr fs: split off setxattr_copy and do_setxattr function from setxattr
2022-05-23Merge tag 'for-5.19/io_uring-2022-05-22' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring updates from Jens Axboe: "Here are the main io_uring changes for 5.19. This contains: - Fixes for sparse type warnings (Christoph, Vasily) - Support for multi-shot accept (Hao) - Support for io_uring managed fixed files, rather than always needing the applicationt o manage the indices (me) - Fix for a spurious poll wakeup (Dylan) - CQE overflow fixes (Dylan) - Support more types of cancelations (me) - Support for co-operative task_work signaling, rather than always forcing an IPI (me) - Support for doing poll first when appropriate, rather than always attempting a transfer first (me) - Provided buffer cleanups and support for mapped buffers (me) - Improve how io_uring handles inflight SCM files (Pavel) - Speedups for registered files (Pavel, me) - Organize the completion data in a struct in io_kiocb rather than keep it in separate spots (Pavel) - task_work improvements (Pavel) - Cleanup and optimize the submission path, in general and for handling links (Pavel) - Speedups for registered resource handling (Pavel) - Support sparse buffers and file maps (Pavel, me) - Various fixes and cleanups (Almog, Pavel, me)" * tag 'for-5.19/io_uring-2022-05-22' of git://git.kernel.dk/linux-block: (111 commits) io_uring: fix incorrect __kernel_rwf_t cast io_uring: disallow mixed provided buffer group registrations io_uring: initialize io_buffer_list head when shared ring is unregistered io_uring: add fully sparse buffer registration io_uring: use rcu_dereference in io_close io_uring: consistently use the EPOLL* defines io_uring: make apoll_events a __poll_t io_uring: drop a spurious inline on a forward declaration io_uring: don't use ERR_PTR for user pointers io_uring: use a rwf_t for io_rw.flags io_uring: add support for ring mapped supplied buffers io_uring: add io_pin_pages() helper io_uring: add buffer selection support to IORING_OP_NOP io_uring: fix locking state for empty buffer group io_uring: implement multishot mode for accept io_uring: let fast poll support multishot io_uring: add REQ_F_APOLL_MULTISHOT for requests io_uring: add IORING_ACCEPT_MULTISHOT for accept io_uring: only wake when the correct events are set io_uring: avoid io-wq -EAGAIN looping for !IOPOLL ...
2022-05-23nfsd: destroy percpu stats counters after reply cache shutdownJulian Schroeder
Upon nfsd shutdown any pending DRC cache is freed. DRC cache use is tracked via a percpu counter. In the current code the percpu counter is destroyed before. If any pending cache is still present, percpu_counter_add is called with a percpu counter==NULL. This causes a kernel crash. The solution is to destroy the percpu counter after the cache is freed. Fixes: e567b98ce9a4b (“nfsd: protect concurrent access to nfsd stats counters”) Signed-off-by: Julian Schroeder <jumaco@amazon.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-05-23f2fs: fix typo in commentJulia Lawall
Spelling mistake (triple letters) in comment. Detected with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-05-23writeback: fix typo in commentJulia Lawall
Spelling mistake (triple letters) in comment. Detected with the help of Coccinelle. Link: https://lore.kernel.org/r/20220521111145.81697-32-Julia.Lawall@inria.fr Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Signed-off-by: Jan Kara <jack@suse.cz>
2022-05-23nfsd: Fix null-ptr-deref in nfsd_fill_super()Zhang Xiaoxu
KASAN report null-ptr-deref as follows: BUG: KASAN: null-ptr-deref in nfsd_fill_super+0xc6/0xe0 [nfsd] Write of size 8 at addr 000000000000005d by task a.out/852 CPU: 7 PID: 852 Comm: a.out Not tainted 5.18.0-rc7-dirty #66 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x34/0x44 kasan_report+0xab/0x120 ? nfsd_mkdir+0x71/0x1c0 [nfsd] ? nfsd_fill_super+0xc6/0xe0 [nfsd] nfsd_fill_super+0xc6/0xe0 [nfsd] ? nfsd_mkdir+0x1c0/0x1c0 [nfsd] get_tree_keyed+0x8e/0x100 vfs_get_tree+0x41/0xf0 __do_sys_fsconfig+0x590/0x670 ? fscontext_read+0x180/0x180 ? anon_inode_getfd+0x4f/0x70 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae This can be reproduce by concurrent operations: 1. fsopen(nfsd)/fsconfig 2. insmod/rmmod nfsd Since the nfsd file system is registered before than nfsd_net allocated, the caller may get the file_system_type and use the nfsd_net before it allocated, then null-ptr-deref occurred. So init_nfsd() should call register_filesystem() last. Fixes: bd5ae9288d64 ("nfsd: register pernet ops last, unregister first") Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-05-23nfsd: Unregister the cld notifier when laundry_wq create failedZhang Xiaoxu
If laundry_wq create failed, the cld notifier should be unregistered. Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-05-23SUNRPC: Use RMW bitops in single-threaded hot pathsChuck Lever
I noticed CPU pipeline stalls while using perf. Once an svc thread is scheduled and executing an RPC, no other processes will touch svc_rqst::rq_flags. Thus bus-locked atomics are not needed outside the svc thread scheduler. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-05-23NFSD: Clean up the show_nf_flags() macroChuck Lever
The flags are defined using C macros, so TRACE_DEFINE_ENUM is unnecessary. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-05-23NFSD: Trace filecache opensChuck Lever
Instrument calls to nfsd_open_verified() to get a sense of the filecache hit rate. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-05-23NFSD: Move documenting comment for nfsd4_process_open2()Chuck Lever
Clean up nfsd4_open() by converting a large comment at the only call site for nfsd4_process_open2() to a kerneldoc comment in front of that function. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-05-23NFSD: Fix whitespaceChuck Lever
Clean up: Pull case arms back one tab stop to conform every other switch statement in fs/nfsd/nfs4proc.c. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-05-23NFSD: Remove dprintk call sites from tail of nfsd4_open()Chuck Lever
Clean up: These relics are not likely to benefit server administrators. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-05-23NFSD: Instantiate a struct file when creating a regular NFSv4 fileChuck Lever
There have been reports of races that cause NFSv4 OPEN(CREATE) to return an error even though the requested file was created. NFSv4 does not provide a status code for this case. To mitigate some of these problems, reorganize the NFSv4 OPEN(CREATE) logic to allocate resources before the file is actually created, and open the new file while the parent directory is still locked. Two new APIs are added: + Add an API that works like nfsd_file_acquire() but does not open the underlying file. The OPEN(CREATE) path can use this API when it already has an open file. + Add an API that is kin to dentry_open(). NFSD needs to create a file and grab an open "struct file *" atomically. The alloc_empty_file() has to be done before the inode create. If it fails (for example, because the NFS server has exceeded its max_files limit), we avoid creating the file and can still return an error to the NFS client. BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=382 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: JianHong Yin <jiyin@redhat.com>
2022-05-23fanotify: fix incorrect fmode_t castsVasily Averin
Fixes sparce warnings: fs/notify/fanotify/fanotify_user.c:267:63: sparse: warning: restricted fmode_t degrades to integer fs/notify/fanotify/fanotify_user.c:1351:28: sparse: warning: restricted fmode_t degrades to integer FMODE_NONTIFY have bitwise fmode_t type and requires __force attribute for any casts. Signed-off-by: Vasily Averin <vvs@openvz.org> Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/9adfd6ac-1b89-791e-796b-49ada3293985@openvz.org
2022-05-23exfat: check if cluster num is validTadeusz Struk
Syzbot reported slab-out-of-bounds read in exfat_clear_bitmap. This was triggered by reproducer calling truncute with size 0, which causes the following trace: BUG: KASAN: slab-out-of-bounds in exfat_clear_bitmap+0x147/0x490 fs/exfat/balloc.c:174 Read of size 8 at addr ffff888115aa9508 by task syz-executor251/365 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack_lvl+0x1e2/0x24b lib/dump_stack.c:118 print_address_description+0x81/0x3c0 mm/kasan/report.c:233 __kasan_report mm/kasan/report.c:419 [inline] kasan_report+0x1a4/0x1f0 mm/kasan/report.c:436 __asan_report_load8_noabort+0x14/0x20 mm/kasan/report_generic.c:309 exfat_clear_bitmap+0x147/0x490 fs/exfat/balloc.c:174 exfat_free_cluster+0x25a/0x4a0 fs/exfat/fatent.c:181 __exfat_truncate+0x99e/0xe00 fs/exfat/file.c:217 exfat_truncate+0x11b/0x4f0 fs/exfat/file.c:243 exfat_setattr+0xa03/0xd40 fs/exfat/file.c:339 notify_change+0xb76/0xe10 fs/attr.c:336 do_truncate+0x1ea/0x2d0 fs/open.c:65 Move the is_valid_cluster() helper from fatent.c to a common header to make it reusable in other *.c files. And add is_valid_cluster() to validate if cluster number is within valid range in exfat_clear_bitmap() and exfat_set_bitmap(). Link: https://syzkaller.appspot.com/bug?id=50381fc73821ecae743b8cf24b4c9a04776f767c Reported-by: syzbot+a4087e40b9c13aad7892@syzkaller.appspotmail.com Fixes: 1e49a94cf707 ("exfat: add bitmap operations") Cc: stable@vger.kernel.org # v5.7+ Signed-off-by: Tadeusz Struk <tadeusz.struk@linaro.org> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2022-05-23exfat: reduce block requests when zeroing a clusterYuezhang Mo
If 'dirsync' is enabled, when zeroing a cluster, submitting sector by sector will generate many block requests, will cause the block device to not fully perform its performance. This commit makes the sectors in a cluster to be submitted in once, it will reduce the number of block requests. This will make the block device to give full play to its performance. Test create 1000 directories on SD card with: $ time (for ((i=0;i<1000;i++)); do mkdir dir${i}; done) Performance has been improved by more than 73% on imx6q-sabrelite. Cluster size Before After Improvement 64 KBytes 3m34.036s 0m56.052s 73.8% 128 KBytes 6m2.644s 1m13.354s 79.8% 256 KBytes 11m22.202s 1m39.451s 85.4% imx6q-sabrelite: - CPU: 792 MHz x4 - Memory: 1GB DDR3 - SD Card: SanDisk 8GB Class 4 Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2022-05-23exfat: introduce mount option 'sys_tz'Chung-Chiang Cheng
EXFAT_TZ_VALID bit in {create,modify,access}_tz is corresponding to OffsetValid field in exfat specification [1]. When this bit isn't set, timestamps should be treated as having the same UTC offset as the current local time. Currently, there is an option 'time_offset' for users to specify the UTC offset for this issue. This patch introduces a new mount option 'sys_tz' to use system timezone as time offset. Link: [1] https://docs.microsoft.com/en-us/windows/win32/fileio/exfat-specification#74102-offsetvalid-field Signed-off-by: Chung-Chiang Cheng <cccheng@synology.com> Acked-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2022-05-23exfat: fix referencing wrong parent directory information after renamingYuezhang Mo
During renaming, the parent directory information maybe updated. But the file/directory still references to the old parent directory information. This bug will cause 2 problems. (1) The renamed file can not be written. [10768.175172] exFAT-fs (sda1): error, failed to bmap (inode : 7afd50e4 iblock : 0, err : -5) [10768.184285] exFAT-fs (sda1): Filesystem has been set read-only ash: write error: Input/output error (2) Some dentries of the renamed file/directory are not set to deleted after removing the file/directory. exfat_update_parent_info() is a workaround for the wrong parent directory information being used after renaming. Now that bug is fixed, this is no longer needed, so remove it. Fixes: 5f2aa075070c ("exfat: add inode operations") Cc: stable@vger.kernel.org # v5.7+ Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Daniel Palmer <daniel.palmer@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2022-05-23Merge branch 'guilt/xfs-5.19-misc-3' into xfs-5.19-for-nextDave Chinner
2022-05-23xfs: share xattr name and value buffers when logging xattr updatesDarrick J. Wong
While running xfs/297 and generic/642, I noticed a crash in xfs_attri_item_relog when it tries to copy the attr name to the new xattri log item. I think what happened here was that we called ->iop_commit on the old attri item (which nulls out the pointers) as part of a log force at the same time that a chained attr operation was ongoing. The system was busy enough that at some later point, the defer ops operation decided it was necessary to relog the attri log item, but as we've detached the name buffer from the old attri log item, we can't copy it to the new one, and kaboom. I think there's a broader refcounting problem with LARP mode -- the setxattr code can return to userspace before the CIL actually formats and commits the log item, which results in a UAF bug. Therefore, the xattr log item needs to be able to retain a reference to the name and value buffers until the log items have completely cleared the log. Furthermore, each time we create an intent log item, we allocate new memory and (re)copy the contents; sharing here would be very useful. Solve the UAF and the unnecessary memory allocations by having the log code create a single refcounted buffer to contain the name and value contents. This buffer can be passed from old to new during a relog operation, and the logging code can (optionally) attach it to the xfs_attr_item for reuse when LARP mode is enabled. This also fixes a problem where the xfs_attri_log_item objects weren't being freed back to the same cache where they came from. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-05-23xfs: do not use logged xattr updates on V4 filesystemsDarrick J. Wong
V4 superblocks do not contain the log_incompat feature bit, which means that we cannot protect xattr log items against kernels that are too old to know how to recover them. Turn off the log items for such filesystems and adjust the "delayed" name to reflect what it's really controlling. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-05-22afs: Adjust ACK interpretation to try and cope with NATDavid Howells
If a client's address changes, say if it is NAT'd, this can disrupt an in progress operation. For most operations, this is not much of a problem, but StoreData can be different as some servers modify the target file as the data comes in, so if a store request is disrupted, the file can get corrupted on the server. The problem is that the server doesn't recognise packets that come after the change of address as belonging to the original client and will bounce them, either by sending an OUT_OF_SEQUENCE ACK to the apparent new call if the packet number falls within the initial sequence number window of a call or by sending an EXCEEDS_WINDOW ACK if it falls outside and then aborting it. In both cases, firstPacket will be 1 and previousPacket will be 0 in the ACK information. Fix this by the following means: (1) If a client call receives an EXCEEDS_WINDOW ACK with firstPacket as 1 and previousPacket as 0, assume this indicates that the server saw the incoming packets from a different peer and thus as a different call. Fail the call with error -ENETRESET. (2) Also fail the call if a similar OUT_OF_SEQUENCE ACK occurs if the first packet has been hard-ACK'd. If it hasn't been hard-ACK'd, the ACK packet will cause it to get retransmitted, so the call will just be repeated. (3) Make afs_select_fileserver() treat -ENETRESET as a straight fail of the operation. (4) Prioritise the error code over things like -ECONNRESET as the server did actually respond. (5) Make writeback treat -ENETRESET as a retryable error and make it redirty all the pages involved in a write so that the VM will retry. Note that there is still a circumstance that I can't easily deal with: if the operation is fully received and processed by the server, but the reply is lost due to address change. There's no way to know if the op happened. We can examine the server, but a conflicting change could have been made by a third party - and we can't tell the difference. In such a case, a message like: kAFS: vnode modified {100058:146266} b7->b8 YFS.StoreData64 (op=2646a) will be logged to dmesg on the next op to touch the file and the client will reset the inode state, including invalidating clean parts of the pagecache. Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-afs@lists.infradead.org Link: http://lists.infradead.org/pipermail/linux-afs/2021-December/004811.html # v1 Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22rxrpc, afs: Fix selection of abort codesDavid Howells
The RX_USER_ABORT code should really only be used to indicate that the user of the rxrpc service (ie. userspace) implicitly caused a call to be aborted - for instance if the AF_RXRPC socket is closed whilst the call was in progress. (The user may also explicitly abort a call and specify the abort code to use). Change some of the points of generation to use other abort codes instead: (1) Abort the call with RXGEN_SS_UNMARSHAL or RXGEN_CC_UNMARSHAL if we see ENOMEM and EFAULT during received data delivery and abort with RX_CALL_DEAD in the default case. (2) Abort with RXGEN_SS_MARSHAL if we get ENOMEM whilst trying to send a reply. (3) Abort with RX_CALL_DEAD if we stop hearing from the peer if we had heard from the peer and abort with RX_CALL_TIMEOUT if we hadn't. (4) Abort with RX_CALL_DEAD if we try to disconnect a call that's not completed successfully or been aborted. Reported-by: Jeffrey Altman <jaltman@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22rxrpc: Fix locking issueDavid Howells
There's a locking issue with the per-netns list of calls in rxrpc. The pieces of code that add and remove a call from the list use write_lock() and the calls procfile uses read_lock() to access it. However, the timer callback function may trigger a removal by trying to queue a call for processing and finding that it's already queued - at which point it has a spare refcount that it has to do something with. Unfortunately, if it puts the call and this reduces the refcount to 0, the call will be removed from the list. Unfortunately, since the _bh variants of the locking functions aren't used, this can deadlock. ================================ WARNING: inconsistent lock state 5.18.0-rc3-build4+ #10 Not tainted -------------------------------- inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. ksoftirqd/2/25 [HC0[0]:SC1[1]:HE1:SE0] takes: ffff888107ac4038 (&rxnet->call_lock){+.?.}-{2:2}, at: rxrpc_put_call+0x103/0x14b {SOFTIRQ-ON-W} state was registered at: ... Possible unsafe locking scenario: CPU0 ---- lock(&rxnet->call_lock); <Interrupt> lock(&rxnet->call_lock); *** DEADLOCK *** 1 lock held by ksoftirqd/2/25: #0: ffff8881008ffdb0 ((&call->timer)){+.-.}-{0:0}, at: call_timer_fn+0x5/0x23d Changes ======= ver #2) - Changed to using list_next_rcu() rather than rcu_dereference() directly. Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both") Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22afs: Fix afs_getattr() to refetch file status if callback break occurredDavid Howells
If a callback break occurs (change notification), afs_getattr() needs to issue an FS.FetchStatus RPC operation to update the status of the file being examined by the stat-family of system calls. Fix afs_getattr() to do this if AFS_VNODE_CB_PROMISED has been cleared on a vnode by a callback break. Skip this if AT_STATX_DONT_SYNC is set. This can be tested by appending to a file on one AFS client and then using "stat -L" to examine its length on a machine running kafs. This can also be watched through tracing on the kafs machine. The callback break is seen: kworker/1:1-46 [001] ..... 978.910812: afs_cb_call: c=0000005f YFSCB.CallBack kworker/1:1-46 [001] ...1. 978.910829: afs_cb_break: 100058:23b4c:242d2c2 b=2 s=1 break-cb kworker/1:1-46 [001] ..... 978.911062: afs_call_done: c=0000005f ret=0 ab=0 [0000000082994ead] And then the stat command generated no traffic if unpatched, but with this change a call to fetch the status can be observed: stat-4471 [000] ..... 986.744122: afs_make_fs_call: c=000000ab 100058:023b4c:242d2c2 YFS.FetchStatus stat-4471 [000] ..... 986.745578: afs_call_done: c=000000ab ret=0 ab=0 [0000000087fc8c84] Fixes: 08e0e7c82eea ("[AF_RXRPC]: Make the in-kernel AFS filesystem use AF_RXRPC.") Reported-by: Markus Suvanto <markus.suvanto@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Tested-by: Markus Suvanto <markus.suvanto@gmail.com> Tested-by: kafs-testing+fedora34_64checkkafs-build-496@auristor.com Link: https://bugzilla.kernel.org/show_bug.cgi?id=216010 Link: https://lore.kernel.org/r/165308359800.162686.14122417881564420962.stgit@warthog.procyon.org.uk/ # v1 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-05-22xfs: Remove duplicate includeJiapeng Chong
Clean up the following includecheck warning: ./fs/xfs/xfs_attr_item.c: xfs_inode.h is included more than once. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-05-22xfs: reduce IOCB_NOWAIT judgment for retry exclusive unaligned DIOKaixu Xia
Retry unaligned DIO with exclusive blocking semantics only when the IOCB_NOWAIT flag is not set. If we are doing nonblocking user I/O, propagate the error directly. Signed-off-by: Kaixu Xia <kaixuxia@tencent.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-05-22xfs: Remove dead codeJiapeng Chong
Remove tht entire xlog_recover_check_summary() function, this entire function is dead code and has been for 12 years. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-05-22xfs: fix typo in commentJulia Lawall
Spelling mistake (triple letters) in comment. Detected with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-05-22xfs: rename struct xfs_attr_item to xfs_attr_intentDarrick J. Wong
Everywhere else in XFS, structures that capture the state of an ongoing deferred work item all have names that end with "_intent". The new extended attribute deferred work items are not named as such, so fix it to follow the naming convention used elsewhere. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-05-22xfs: clean up state variable usage in xfs_attr_node_remove_attrDarrick J. Wong
The state variable is now a local variable pointing to a heap allocation, so we don't need to zero-initialize it, nor do we need the conditional to decide if we should free it. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-05-22xfs: put attr[id] log item cache init with the othersDarrick J. Wong
Initialize and destroy the xattr log item caches in the same places that we do all the other log item caches. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-05-22xfs: remove struct xfs_attr_item.xattri_flagsDarrick J. Wong
Nobody uses this field, so get rid of it and the unused flag definition. Rearrange the structure layout to reduce its size from 104 to 96 bytes. This gets us from 39 to 42 objects per page. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-05-22xfs: use a separate slab cache for deferred xattr work stateDarrick J. Wong
Create a separate slab cache for struct xfs_attr_item objects, since we can pack the (104-byte) intent items more tightly than we can with the general slab cache objects. On x86, this means 39 intents per memory page instead of 32. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-05-22xfs: put the xattr intent item op flags in their own namespaceDarrick J. Wong
The flags that are stored in the extended attr intent log item really should have a separate namespace from the rest of the XFS_ATTR_* flags. Give them one to make it a little more obvious that they're intent item flags. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-05-22xfs: clean up xfs_attr_node_hasnameDarrick J. Wong
The calling conventions of this function are a mess -- callers /can/ provide a pointer to a pointer to a state structure, but it's not required, and as evidenced by the last two patches, the callers that do weren't be careful enough about how to deal with an existing da state. Push the allocation and freeing responsibilty to the callers, which means that callers from the xattr node state machine steps now have the visibility to allocate or free the da state structure as they please. As a bonus, the node remove/add paths for larp-mode replaces can reset the da state structure instead of freeing and immediately reallocating it. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-05-22smb3: add trace point for oplock not foundSteve French
In order to debug problems with server potentially sending us an oplock that we don't recognize (or a race with close and oplock break) it would be helpful to have a dynamic trace point for this case. New tracepoint is called trace_smb3_oplock_not_found Signed-off-by: Steve French <stfrench@microsoft.com>
2022-05-22cifs: return the more nuanced writeback error on close()ChenXiaoSong
As filemap_check_errors() only report -EIO or -ENOSPC, we return more nuanced writeback error -(file->f_mapping->wb_err & MAX_ERRNO). filemap_write_and_wait filemap_write_and_wait_range filemap_check_errors -ENOSPC or -EIO filemap_check_wb_err errseq_check return -(file->f_mapping->wb_err & MAX_ERRNO) Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-05-21smb3: add trace point for lease not found issueSteve French
When trying to debug problems with server sending us a lease we don't recognize, it would be helpful to have a dynamic trace point for this case. New tracepoint is called trace_smb3_lease_not_found Acked-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-05-21cifs: smbd: fix typo in commentJulia Lawall
Spelling mistake (triple letters) in comment. Detected with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-05-21ext4: fix bug_on in ext4_writepagesYe Bin
we got issue as follows: EXT4-fs error (device loop0): ext4_mb_generate_buddy:1141: group 0, block bitmap and bg descriptor inconsistent: 25 vs 31513 free cls ------------[ cut here ]------------ kernel BUG at fs/ext4/inode.c:2708! invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI CPU: 2 PID: 2147 Comm: rep Not tainted 5.18.0-rc2-next-20220413+ #155 RIP: 0010:ext4_writepages+0x1977/0x1c10 RSP: 0018:ffff88811d3e7880 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffff88811c098000 RDX: 0000000000000000 RSI: ffff88811c098000 RDI: 0000000000000002 RBP: ffff888128140f50 R08: ffffffffb1ff6387 R09: 0000000000000000 R10: 0000000000000007 R11: ffffed10250281ea R12: 0000000000000001 R13: 00000000000000a4 R14: ffff88811d3e7bb8 R15: ffff888128141028 FS: 00007f443aed9740(0000) GS:ffff8883aef00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020007200 CR3: 000000011c2a4000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> do_writepages+0x130/0x3a0 filemap_fdatawrite_wbc+0x83/0xa0 filemap_flush+0xab/0xe0 ext4_alloc_da_blocks+0x51/0x120 __ext4_ioctl+0x1534/0x3210 __x64_sys_ioctl+0x12c/0x170 do_syscall_64+0x3b/0x90 It may happen as follows: 1. write inline_data inode vfs_write new_sync_write ext4_file_write_iter ext4_buffered_write_iter generic_perform_write ext4_da_write_begin ext4_da_write_inline_data_begin -> If inline data size too small will allocate block to write, then mapping will has dirty page ext4_da_convert_inline_data_to_extent ->clear EXT4_STATE_MAY_INLINE_DATA 2. fallocate do_vfs_ioctl ioctl_preallocate vfs_fallocate ext4_fallocate ext4_convert_inline_data ext4_convert_inline_data_nolock ext4_map_blocks -> fail will goto restore data ext4_restore_inline_data ext4_create_inline_data ext4_write_inline_data ext4_set_inode_state -> set inode EXT4_STATE_MAY_INLINE_DATA 3. writepages __ext4_ioctl ext4_alloc_da_blocks filemap_flush filemap_fdatawrite_wbc do_writepages ext4_writepages if (ext4_has_inline_data(inode)) BUG_ON(ext4_test_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA)) The root cause of this issue is we destory inline data until call ext4_writepages under delay allocation mode. But there maybe already convert from inline to extent. To solve this issue, we call filemap_flush first.. Cc: stable@kernel.org Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220516122634.1690462-1-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-05-21ext4: refactor and move ext4_ioctl_get_encryption_pwsalt()Ritesh Harjani
This patch move code for FS_IOC_GET_ENCRYPTION_PWSALT case into ext4's crypto.c file, i.e. ext4_ioctl_get_encryption_pwsalt() and uuid_is_zero(). This is mostly refactoring logic and should not affect any functionality change. Suggested-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Ritesh Harjani <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/5af98b17152a96b245b4f7d2dfb8607fc93e36aa.1652595565.git.ritesh.list@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-05-21ext4: cleanup function defs from ext4.h into crypto.cRitesh Harjani
Some of these functions when CONFIG_FS_ENCRYPTION is enabled are not really inline (let compiler be the best judge of it). Remove inline and move them into crypto.c where they should be present. Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Ritesh Harjani <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/b7b9de2c7226298663fb5a0c28909135e2ab220f.1652595565.git.ritesh.list@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-05-21ext4: move ext4 crypto code to its own file crypto.cRitesh Harjani
This is to cleanup super.c file which has grown quite large. So, start moving ext4 crypto related code to where it should be in the first place i.e. fs/ext4/crypto.c Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Ritesh Harjani <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/7d637e093cbc34d727397e8d41a53a1b9ca7d7a4.1652595565.git.ritesh.list@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-05-21ksmbd: fix outstanding credits related bugsHyunchul Lee
outstanding credits must be initialized to 0, because it means the sum of credits consumed by in-flight requests. And outstanding credits must be compared with total credits in smb2_validate_credit_charge(), because total credits are the sum of credits granted by ksmbd. This patch fix the following error, while frametest with Windows clients: Limits exceeding the maximum allowable outstanding requests, given : 128, pending : 8065 Fixes: b589f5db6d4a ("ksmbd: limits exceeding the maximum allowable outstanding requests") Cc: stable@vger.kernel.org Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com> Reported-by: Yufan Chen <wiz.chen@gmail.com> Tested-by: Yufan Chen <wiz.chen@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-05-21ksmbd: smbd: fix connection dropped issueHyunchul Lee
When there are bursty connection requests, RDMA connection event handler is deferred and Negotiation requests are received even if connection status is NEW. To handle it, set the status to CONNECTED if Negotiation requests are received. Reported-by: Yufan Chen <wiz.chen@gmail.com> Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com> Tested-by: Yufan Chen <wiz.chen@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>