summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2021-01-25NFSD: Update the MKNOD3args decoder to use struct xdr_streamChuck Lever
This commit removes the last usage of the original decode_sattr3(), so it is removed as a clean-up. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25NFSD: Update the SYMLINK3args decoder to use struct xdr_streamChuck Lever
Similar to the WRITE decoder, code that checks the sanity of the payload size is re-wired to work with xdr_stream infrastructure. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25NFSD: Update the MKDIR3args decoder to use struct xdr_streamChuck Lever
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25NFSD: Update the CREATE3args decoder to use struct xdr_streamChuck Lever
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25NFSD: Update the SETATTR3args decoder to use struct xdr_streamChuck Lever
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25NFSD: Update the LINK3args decoder to use struct xdr_streamChuck Lever
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25NFSD: Update the RENAME3args decoder to use struct xdr_streamChuck Lever
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25NFSD: Update the NFSv3 DIROPargs decoder to use struct xdr_streamChuck Lever
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25NFSD: Update COMMIT3arg decoder to use struct xdr_streamChuck Lever
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25NFSD: Update READDIR3args decoders to use struct xdr_streamChuck Lever
As an additional clean up, neither nfsd3_proc_readdir() nor nfsd3_proc_readdirplus() make use of the dircount argument, so remove it from struct nfsd3_readdirargs. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25NFSD: Add helper to set up the pages where the dirlist is encodedChuck Lever
De-duplicate some code that is used by both READDIR and READDIRPLUS to build the dirlist in the Reply. Because this code is not related to decoding READ arguments, it is moved to a more appropriate spot. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25NFSD: Fix returned READDIR offset cookieChuck Lever
Code inspection shows that the server's NFSv3 READDIR implementation handles offset cookies slightly differently than the NFSv2 READDIR, NFSv3 READDIRPLUS, and NFSv4 READDIR implementations, and there doesn't seem to be any need for this difference. As a clean up, I copied the logic from nfsd3_proc_readdirplus(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25NFSD: Update READLINK3arg decoder to use struct xdr_streamChuck Lever
The NFSv3 READLINK request takes a single filehandle, so it can re-use GETATTR's decoder. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25NFSD: Update WRITE3arg decoder to use struct xdr_streamChuck Lever
As part of the update, open code that sanity-checks the size of the data payload against the length of the RPC Call message has to be re-implemented to use xdr_stream infrastructure. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25NFSD: Update READ3arg decoder to use struct xdr_streamChuck Lever
The code that sets up rq_vec is refactored so that it is now adjacent to the nfsd_read() call site where it is used. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25NFSD: Update ACCESS3arg decoder to use struct xdr_streamChuck Lever
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25NFSD: Update GETATTR3args decoder to use struct xdr_streamChuck Lever
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25SUNRPC: Make trace_svc_process() display the RPC procedure symbolicallyChuck Lever
The next few patches will employ these strings to help make server- side trace logs more human-readable. A similar technique is already in use in kernel RPC client code. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-24Merge tag 'io_uring-5.11-2021-01-24' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring fixes from Jens Axboe: "Still need a final cancelation fix that isn't quite done done, expected in the next day or two. That said, this contains: - Wakeup fix for IOPOLL requests - SQPOLL split close op handling fix - Ensure that any use of io_uring fd itself is marked as inflight - Short non-regular file read fix (Pavel) - Fix up bad false positive warning (Pavel) - SQPOLL fixes (Pavel) - In-flight removal fix (Pavel)" * tag 'io_uring-5.11-2021-01-24' of git://git.kernel.dk/linux-block: io_uring: account io_uring internal files as REQ_F_INFLIGHT io_uring: fix sleeping under spin in __io_clean_op io_uring: fix short read retries for non-reg files io_uring: fix SQPOLL IORING_OP_CLOSE cancelation state io_uring: fix skipping disabling sqo on exec io_uring: fix uring_flush in exit_files() warning io_uring: fix false positive sqo warning on flush io_uring: iopoll requests should also wake task ->in_idle state
2021-01-24Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc fixes from Andrew Morton: "18 patches. Subsystems affected by this patch series: mm (pagealloc, memcg, kasan, memory-failure, and highmem), ubsan, proc, and MAINTAINERS" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: MAINTAINERS: add a couple more files to the Clang/LLVM section proc_sysctl: fix oops caused by incorrect command parameters powerpc/mm/highmem: use __set_pte_at() for kmap_local() mips/mm/highmem: use set_pte() for kmap_local() mm/highmem: prepare for overriding set_pte_at() sparc/mm/highmem: flush cache and TLB mm: fix page reference leak in soft_offline_page() ubsan: disable unsigned-overflow check for i386 kasan, mm: fix resetting page_alloc tags for HW_TAGS kasan, mm: fix conflicts with init_on_alloc/free kasan: fix HW_TAGS boot parameters kasan: fix incorrect arguments passing in kasan_add_zero_shadow kasan: fix unaligned address is unhandled in kasan_remove_zero_shadow mm: fix numa stats for thp migration mm: memcg: fix memcg file_dirty numa stat mm: memcg/slab: optimize objcg stock draining mm: fix initialization of struct page for holes in memory layout x86/setup: don't remove E820_TYPE_RAM for pfn 0
2021-01-24Merge tag 'driver-core-5.11-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core fixes from Greg KH: "Here are some small driver core fixes for 5.11-rc5 that resolve some reported problems: - revert of a -rc1 patch that was causing problems with some machines - device link device name collision problem fix (busses only have to name devices unique to their bus, not unique to all busses) - kernfs splice bugfixes to resolve firmware loading problems for Qualcomm systems. - other tiny driver core fixes for minor issues reported. All of these have been in linux-next with no reported problems" * tag 'driver-core-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: driver core: Fix device link device name collision driver core: Extend device_is_dependent() kernfs: wire up ->splice_read and ->splice_write kernfs: implement ->write_iter kernfs: implement ->read_iter Revert "driver core: Reorder devices on successful probe" Driver core: platform: Add extra error check in devm_platform_get_irqs_affinity() drivers core: Free dma_range_map when driver probe failed
2021-01-24proc_sysctl: fix oops caused by incorrect command parametersXiaoming Ni
The process_sysctl_arg() does not check whether val is empty before invoking strlen(val). If the command line parameter () is incorrectly configured and val is empty, oops is triggered. For example: "hung_task_panic=1" is incorrectly written as "hung_task_panic", oops is triggered. The call stack is as follows: Kernel command line: .... hung_task_panic ...... Call trace: __pi_strlen+0x10/0x98 parse_args+0x278/0x344 do_sysctl_args+0x8c/0xfc kernel_init+0x5c/0xf4 ret_from_fork+0x10/0x30 To fix it, check whether "val" is empty when "phram" is a sysctl field. Error codes are returned in the failure branch, and error logs are generated by parse_args(). Link: https://lkml.kernel.org/r/20210118133029.28580-1-nixiaoming@huawei.com Fixes: 3db978d480e2843 ("kernel/sysctl: support setting sysctl parameters from kernel command line") Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Iurii Zaikin <yzaikin@google.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Heiner Kallweit <hkallweit1@gmail.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: <stable@vger.kernel.org> [5.8+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-01-24Merge tag '5.11-rc4-smb3' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cifs fixes from Steve French: "An important signal handling patch for stable, and two small cleanup patches" * tag '5.11-rc4-smb3' of git://git.samba.org/sfrench/cifs-2.6: cifs: do not fail __smb_send_rqst if non-fatal signals are pending fs/cifs: Simplify bool comparison. fs/cifs: Assign boolean values to a bool variable
2021-01-24io_uring: account io_uring internal files as REQ_F_INFLIGHTJens Axboe
We need to actively cancel anything that introduces a potential circular loop, where io_uring holds a reference to itself. If the file in question is an io_uring file, then add the request to the inflight list. Cc: stable@vger.kernel.org # 5.9+ Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-24io_uring: fix sleeping under spin in __io_clean_opPavel Begunkov
[ 27.629441] BUG: sleeping function called from invalid context at fs/file.c:402 [ 27.631317] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1012, name: io_wqe_worker-0 [ 27.633220] 1 lock held by io_wqe_worker-0/1012: [ 27.634286] #0: ffff888105e26c98 (&ctx->completion_lock) {....}-{2:2}, at: __io_req_complete.part.102+0x30/0x70 [ 27.649249] Call Trace: [ 27.649874] dump_stack+0xac/0xe3 [ 27.650666] ___might_sleep+0x284/0x2c0 [ 27.651566] put_files_struct+0xb8/0x120 [ 27.652481] __io_clean_op+0x10c/0x2a0 [ 27.653362] __io_cqring_fill_event+0x2c1/0x350 [ 27.654399] __io_req_complete.part.102+0x41/0x70 [ 27.655464] io_openat2+0x151/0x300 [ 27.656297] io_issue_sqe+0x6c/0x14e0 [ 27.660991] io_wq_submit_work+0x7f/0x240 [ 27.662890] io_worker_handle_work+0x501/0x8a0 [ 27.664836] io_wqe_worker+0x158/0x520 [ 27.667726] kthread+0x134/0x180 [ 27.669641] ret_from_fork+0x1f/0x30 Instead of cleaning files on overflow, return back overflow cancellation into io_uring_cancel_files(). Previously it was racy to clean REQ_F_OVERFLOW flag, but we got rid of it, and can do it through repetitive attempts targeting all matching requests. Reported-by: Abaci <abaci@linux.alibaba.com> Reported-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-23cifs: do not fail __smb_send_rqst if non-fatal signals are pendingRonnie Sahlberg
RHBZ 1848178 The original intent of returning an error in this function in the patch: "CIFS: Mask off signals when sending SMB packets" was to avoid interrupting packet send in the middle of sending the data (and thus breaking an SMB connection), but we also don't want to fail the request for non-fatal signals even before we have had a chance to try to send it (the reported problem could be reproduced e.g. by exiting a child process when the parent process was in the midst of calling futimens to update a file's timestamps). In addition, since the signal may remain pending when we enter the sending loop, we may end up not sending the whole packet before TCP buffers become full. In this case the code returns -EINTR but what we need here is to return -ERESTARTSYS instead to allow system calls to be restarted. Fixes: b30c74c73c78 ("CIFS: Mask off signals when sending SMB packets") Cc: stable@vger.kernel.org # v5.1+ Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-01-22Merge tag 'ceph-for-5.11-rc5' of git://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph fixes from Ilya Dryomov: "A patch to zero out sensitive cryptographic data and two minor cleanups prompted by the fact that a bunch of code was moved in this cycle" * tag 'ceph-for-5.11-rc5' of git://github.com/ceph/ceph-client: libceph: fix "Boolean result is used in bitwise operation" warning libceph, ceph: disambiguate ceph_connection_operations handlers libceph: zero out session key and connection secret
2021-01-22io_uring: fix short read retries for non-reg filesPavel Begunkov
Sockets and other non-regular files may actually expect short reads to happen, don't retry reads for them. Because non-reg files don't set FMODE_BUF_RASYNC and so it won't do second/retry do_read, we can filter out those cases after first do_read() attempt with ret>0. Cc: stable@vger.kernel.org # 5.9+ Suggested-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-22io_uring: fix SQPOLL IORING_OP_CLOSE cancelation stateJens Axboe
IORING_OP_CLOSE is special in terms of cancelation, since it has an intermediate state where we've removed the file descriptor but hasn't closed the file yet. For that reason, it's currently marked with IO_WQ_WORK_NO_CANCEL to prevent cancelation. This ensures that the op is always run even if canceled, to prevent leaving us with a live file but an fd that is gone. However, with SQPOLL, since a cancel request doesn't carry any resources on behalf of the request being canceled, if we cancel before any of the close op has been run, we can end up with io-wq not having the ->files assigned. This can result in the following oops reported by Joseph: BUG: kernel NULL pointer dereference, address: 00000000000000d8 PGD 800000010b76f067 P4D 800000010b76f067 PUD 10b462067 PMD 0 Oops: 0000 [#1] SMP PTI CPU: 1 PID: 1788 Comm: io_uring-sq Not tainted 5.11.0-rc4 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:__lock_acquire+0x19d/0x18c0 Code: 00 00 8b 1d fd 56 dd 08 85 db 0f 85 43 05 00 00 48 c7 c6 98 7b 95 82 48 c7 c7 57 96 93 82 e8 9a bc f5 ff 0f 0b e9 2b 05 00 00 <48> 81 3f c0 ca 67 8a b8 00 00 00 00 41 0f 45 c0 89 04 24 e9 81 fe RSP: 0018:ffffc90001933828 EFLAGS: 00010002 RAX: 0000000000000001 RBX: 0000000000000001 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000000000d8 RBP: 0000000000000246 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: ffff888106e8a140 R15: 00000000000000d8 FS: 0000000000000000(0000) GS:ffff88813bd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000000d8 CR3: 0000000106efa004 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: lock_acquire+0x31a/0x440 ? close_fd_get_file+0x39/0x160 ? __lock_acquire+0x647/0x18c0 _raw_spin_lock+0x2c/0x40 ? close_fd_get_file+0x39/0x160 close_fd_get_file+0x39/0x160 io_issue_sqe+0x1334/0x14e0 ? lock_acquire+0x31a/0x440 ? __io_free_req+0xcf/0x2e0 ? __io_free_req+0x175/0x2e0 ? find_held_lock+0x28/0xb0 ? io_wq_submit_work+0x7f/0x240 io_wq_submit_work+0x7f/0x240 io_wq_cancel_cb+0x161/0x580 ? io_wqe_wake_worker+0x114/0x360 ? io_uring_get_socket+0x40/0x40 io_async_find_and_cancel+0x3b/0x140 io_issue_sqe+0xbe1/0x14e0 ? __lock_acquire+0x647/0x18c0 ? __io_queue_sqe+0x10b/0x5f0 __io_queue_sqe+0x10b/0x5f0 ? io_req_prep+0xdb/0x1150 ? mark_held_locks+0x6d/0xb0 ? mark_held_locks+0x6d/0xb0 ? io_queue_sqe+0x235/0x4b0 io_queue_sqe+0x235/0x4b0 io_submit_sqes+0xd7e/0x12a0 ? _raw_spin_unlock_irq+0x24/0x30 ? io_sq_thread+0x3ae/0x940 io_sq_thread+0x207/0x940 ? do_wait_intr_irq+0xc0/0xc0 ? __ia32_sys_io_uring_enter+0x650/0x650 kthread+0x134/0x180 ? kthread_create_worker_on_cpu+0x90/0x90 ret_from_fork+0x1f/0x30 Fix this by moving the IO_WQ_WORK_NO_CANCEL until _after_ we've modified the fdtable. Canceling before this point is totally fine, and running it in the io-wq context _after_ that point is also fine. For 5.12, we'll handle this internally and get rid of the no-cancel flag, as IORING_OP_CLOSE is the only user of it. Cc: stable@vger.kernel.org Fixes: b5dba59e0cf7 ("io_uring: add support for IORING_OP_CLOSE") Reported-by: "Abaci <abaci@linux.alibaba.com>" Reviewed-and-tested-by: Joseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-21Merge tag 'fs_for_v5.11-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fs and udf fixes from Jan Kara: "A lazytime handling fix from Eric Biggers and a fix of UDF session handling for large devices" * tag 'fs_for_v5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: udf: fix the problem that the disc content is not displayed fs: fix lazytime expiration handling in __writeback_single_inode()
2021-01-21kernfs: wire up ->splice_read and ->splice_writeChristoph Hellwig
Wire up the splice_read and splice_write methods to the default helpers using ->read_iter and ->write_iter now that those are implemented for kernfs. This restores support to use splice and sendfile on kernfs files. Fixes: 36e2c7421f02 ("fs: don't allow splice read/write without explicit ops") Reported-by: Siddharth Gupta <sidgup@codeaurora.org> Tested-by: Siddharth Gupta <sidgup@codeaurora.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210120204631.274206-4-hch@lst.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-21kernfs: implement ->write_iterChristoph Hellwig
Switch kernfs to implement the write_iter method instead of plain old write to prepare to supporting splice and sendfile again. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210120204631.274206-3-hch@lst.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-21kernfs: implement ->read_iterChristoph Hellwig
Switch kernfs to implement the read_iter method instead of plain old read to prepare to supporting splice and sendfile again. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210120204631.274206-2-hch@lst.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-20Merge tag 'for-5.11-rc4-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "A few more one line fixes for various bugs, stable material. - fix send when emitting clone operation from the same file and root - fix double free on error when cleaning backrefs - lockdep fix during relocation - handle potential error during reloc when starting transaction - skip running delayed refs during commit (leftover from code removal in this dev cycle)" * tag 'for-5.11-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: don't clear ret in btrfs_start_dirty_block_groups btrfs: fix lockdep splat in btrfs_recover_relocation btrfs: do not double free backref nodes on error btrfs: don't get an EINTR during drop_snapshot for reloc btrfs: send: fix invalid clone operations when cloning from the same file and root btrfs: no need to run delayed refs after commit_fs_roots during commit
2021-01-20cachefiles: Drop superfluous readpages aops NULL checkTakashi Iwai
After the recent actions to convert readpages aops to readahead, the NULL checks of readpages aops in cachefiles_read_or_alloc_page() may hit falsely. More badly, it's an ASSERT() call, and this panics. Drop the superfluous NULL checks for fixing this regression. [DH: Note that cachefiles never actually used readpages, so this check was never actually necessary] BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=208883 BugLink: https://bugzilla.opensuse.org/show_bug.cgi?id=1175245 Fixes: 9ae326a69004 ("CacheFiles: A cache that backs onto a mounted filesystem") Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-01-19Merge tag 'nfsd-5.11-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: - Avoid exposing parent of root directory in NFSv3 READDIRPLUS results - Fix a tracepoint change that went in the initial 5.11 merge * tag 'nfsd-5.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: SUNRPC: Move the svc_xdr_recvfrom tracepoint again nfsd4: readdirplus shouldn't return parent of export
2021-01-18btrfs: don't clear ret in btrfs_start_dirty_block_groupsJosef Bacik
If we fail to update a block group item in the loop we'll break, however we'll do btrfs_run_delayed_refs and lose our error value in ret, and thus not clean up properly. Fix this by only running the delayed refs if there was no failure. CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-01-18btrfs: fix lockdep splat in btrfs_recover_relocationJosef Bacik
While testing the error paths of relocation I hit the following lockdep splat: ====================================================== WARNING: possible circular locking dependency detected 5.10.0-rc6+ #217 Not tainted ------------------------------------------------------ mount/779 is trying to acquire lock: ffffa0e676945418 (&fs_info->balance_mutex){+.+.}-{3:3}, at: btrfs_recover_balance+0x2f0/0x340 but task is already holding lock: ffffa0e60ee31da8 (btrfs-root-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x27/0x100 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (btrfs-root-00){++++}-{3:3}: down_read_nested+0x43/0x130 __btrfs_tree_read_lock+0x27/0x100 btrfs_read_lock_root_node+0x31/0x40 btrfs_search_slot+0x462/0x8f0 btrfs_update_root+0x55/0x2b0 btrfs_drop_snapshot+0x398/0x750 clean_dirty_subvols+0xdf/0x120 btrfs_recover_relocation+0x534/0x5a0 btrfs_start_pre_rw_mount+0xcb/0x170 open_ctree+0x151f/0x1726 btrfs_mount_root.cold+0x12/0xea legacy_get_tree+0x30/0x50 vfs_get_tree+0x28/0xc0 vfs_kern_mount.part.0+0x71/0xb0 btrfs_mount+0x10d/0x380 legacy_get_tree+0x30/0x50 vfs_get_tree+0x28/0xc0 path_mount+0x433/0xc10 __x64_sys_mount+0xe3/0x120 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #1 (sb_internal#2){.+.+}-{0:0}: start_transaction+0x444/0x700 insert_balance_item.isra.0+0x37/0x320 btrfs_balance+0x354/0xf40 btrfs_ioctl_balance+0x2cf/0x380 __x64_sys_ioctl+0x83/0xb0 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #0 (&fs_info->balance_mutex){+.+.}-{3:3}: __lock_acquire+0x1120/0x1e10 lock_acquire+0x116/0x370 __mutex_lock+0x7e/0x7b0 btrfs_recover_balance+0x2f0/0x340 open_ctree+0x1095/0x1726 btrfs_mount_root.cold+0x12/0xea legacy_get_tree+0x30/0x50 vfs_get_tree+0x28/0xc0 vfs_kern_mount.part.0+0x71/0xb0 btrfs_mount+0x10d/0x380 legacy_get_tree+0x30/0x50 vfs_get_tree+0x28/0xc0 path_mount+0x433/0xc10 __x64_sys_mount+0xe3/0x120 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 other info that might help us debug this: Chain exists of: &fs_info->balance_mutex --> sb_internal#2 --> btrfs-root-00 Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(btrfs-root-00); lock(sb_internal#2); lock(btrfs-root-00); lock(&fs_info->balance_mutex); *** DEADLOCK *** 2 locks held by mount/779: #0: ffffa0e60dc040e0 (&type->s_umount_key#47/1){+.+.}-{3:3}, at: alloc_super+0xb5/0x380 #1: ffffa0e60ee31da8 (btrfs-root-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x27/0x100 stack backtrace: CPU: 0 PID: 779 Comm: mount Not tainted 5.10.0-rc6+ #217 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014 Call Trace: dump_stack+0x8b/0xb0 check_noncircular+0xcf/0xf0 ? trace_call_bpf+0x139/0x260 __lock_acquire+0x1120/0x1e10 lock_acquire+0x116/0x370 ? btrfs_recover_balance+0x2f0/0x340 __mutex_lock+0x7e/0x7b0 ? btrfs_recover_balance+0x2f0/0x340 ? btrfs_recover_balance+0x2f0/0x340 ? rcu_read_lock_sched_held+0x3f/0x80 ? kmem_cache_alloc_trace+0x2c4/0x2f0 ? btrfs_get_64+0x5e/0x100 btrfs_recover_balance+0x2f0/0x340 open_ctree+0x1095/0x1726 btrfs_mount_root.cold+0x12/0xea ? rcu_read_lock_sched_held+0x3f/0x80 legacy_get_tree+0x30/0x50 vfs_get_tree+0x28/0xc0 vfs_kern_mount.part.0+0x71/0xb0 btrfs_mount+0x10d/0x380 ? __kmalloc_track_caller+0x2f2/0x320 legacy_get_tree+0x30/0x50 vfs_get_tree+0x28/0xc0 ? capable+0x3a/0x60 path_mount+0x433/0xc10 __x64_sys_mount+0xe3/0x120 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 This is straightforward to fix, simply release the path before we setup the balance_ctl. CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-01-18btrfs: do not double free backref nodes on errorJosef Bacik
Zygo reported the following KASAN splat: BUG: KASAN: use-after-free in btrfs_backref_cleanup_node+0x18a/0x420 Read of size 8 at addr ffff888112402950 by task btrfs/28836 CPU: 0 PID: 28836 Comm: btrfs Tainted: G W 5.10.0-e35f27394290-for-next+ #23 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 Call Trace: dump_stack+0xbc/0xf9 ? btrfs_backref_cleanup_node+0x18a/0x420 print_address_description.constprop.8+0x21/0x210 ? record_print_text.cold.34+0x11/0x11 ? btrfs_backref_cleanup_node+0x18a/0x420 ? btrfs_backref_cleanup_node+0x18a/0x420 kasan_report.cold.10+0x20/0x37 ? btrfs_backref_cleanup_node+0x18a/0x420 __asan_load8+0x69/0x90 btrfs_backref_cleanup_node+0x18a/0x420 btrfs_backref_release_cache+0x83/0x1b0 relocate_block_group+0x394/0x780 ? merge_reloc_roots+0x4a0/0x4a0 btrfs_relocate_block_group+0x26e/0x4c0 btrfs_relocate_chunk+0x52/0x120 btrfs_balance+0xe2e/0x1900 ? check_flags.part.50+0x6c/0x1e0 ? btrfs_relocate_chunk+0x120/0x120 ? kmem_cache_alloc_trace+0xa06/0xcb0 ? _copy_from_user+0x83/0xc0 btrfs_ioctl_balance+0x3a7/0x460 btrfs_ioctl+0x24c8/0x4360 ? __kasan_check_read+0x11/0x20 ? check_chain_key+0x1f4/0x2f0 ? __asan_loadN+0xf/0x20 ? btrfs_ioctl_get_supported_features+0x30/0x30 ? kvm_sched_clock_read+0x18/0x30 ? check_chain_key+0x1f4/0x2f0 ? lock_downgrade+0x3f0/0x3f0 ? handle_mm_fault+0xad6/0x2150 ? do_vfs_ioctl+0xfc/0x9d0 ? ioctl_file_clone+0xe0/0xe0 ? check_flags.part.50+0x6c/0x1e0 ? check_flags.part.50+0x6c/0x1e0 ? check_flags+0x26/0x30 ? lock_is_held_type+0xc3/0xf0 ? syscall_enter_from_user_mode+0x1b/0x60 ? do_syscall_64+0x13/0x80 ? rcu_read_lock_sched_held+0xa1/0xd0 ? __kasan_check_read+0x11/0x20 ? __fget_light+0xae/0x110 __x64_sys_ioctl+0xc3/0x100 do_syscall_64+0x37/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f4c4bdfe427 Allocated by task 28836: kasan_save_stack+0x21/0x50 __kasan_kmalloc.constprop.18+0xbe/0xd0 kasan_kmalloc+0x9/0x10 kmem_cache_alloc_trace+0x410/0xcb0 btrfs_backref_alloc_node+0x46/0xf0 btrfs_backref_add_tree_node+0x60d/0x11d0 build_backref_tree+0xc5/0x700 relocate_tree_blocks+0x2be/0xb90 relocate_block_group+0x2eb/0x780 btrfs_relocate_block_group+0x26e/0x4c0 btrfs_relocate_chunk+0x52/0x120 btrfs_balance+0xe2e/0x1900 btrfs_ioctl_balance+0x3a7/0x460 btrfs_ioctl+0x24c8/0x4360 __x64_sys_ioctl+0xc3/0x100 do_syscall_64+0x37/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Freed by task 28836: kasan_save_stack+0x21/0x50 kasan_set_track+0x20/0x30 kasan_set_free_info+0x1f/0x30 __kasan_slab_free+0xf3/0x140 kasan_slab_free+0xe/0x10 kfree+0xde/0x200 btrfs_backref_error_cleanup+0x452/0x530 build_backref_tree+0x1a5/0x700 relocate_tree_blocks+0x2be/0xb90 relocate_block_group+0x2eb/0x780 btrfs_relocate_block_group+0x26e/0x4c0 btrfs_relocate_chunk+0x52/0x120 btrfs_balance+0xe2e/0x1900 btrfs_ioctl_balance+0x3a7/0x460 btrfs_ioctl+0x24c8/0x4360 __x64_sys_ioctl+0xc3/0x100 do_syscall_64+0x37/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xa9 This occurred because we freed our backref node in btrfs_backref_error_cleanup(), but then tried to free it again in btrfs_backref_release_cache(). This is because btrfs_backref_release_cache() will cycle through all of the cache->leaves nodes and free them up. However btrfs_backref_error_cleanup() freed the backref node with btrfs_backref_free_node(), which simply kfree()d the backref node without unlinking it from the cache. Change this to a btrfs_backref_drop_node(), which does the appropriate cleanup and removes the node from the cache->leaves list, so when we go to free the remaining cache we don't trip over items we've already dropped. Fixes: 75bfb9aff45e ("Btrfs: cleanup error handling in build_backref_tree") CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-01-18btrfs: don't get an EINTR during drop_snapshot for relocJosef Bacik
This was partially fixed by f3e3d9cc3525 ("btrfs: avoid possible signal interruption of btrfs_drop_snapshot() on relocation tree"), however it missed a spot when we restart a trans handle because we need to end the transaction. The fix is the same, simply use btrfs_join_transaction() instead of btrfs_start_transaction() when deleting reloc roots. Fixes: f3e3d9cc3525 ("btrfs: avoid possible signal interruption of btrfs_drop_snapshot() on relocation tree") CC: stable@vger.kernel.org # 5.4+ Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-01-18udf: fix the problem that the disc content is not displayedlianzhi chang
When the capacity of the disc is too large (assuming the 4.7G specification), the disc (UDF file system) will be burned multiple times in the windows (Multisession Usage). When the remaining capacity of the CD is less than 300M (estimated value, for reference only), open the CD in the Linux system, the content of the CD is displayed as blank (the kernel will say "No VRS found"). Windows can display the contents of the CD normally. Through analysis, in the "fs/udf/super.c": udf_check_vsd function, the actual value of VSD_MAX_SECTOR_OFFSET may be much larger than 0x800000. According to the current code logic, it is found that the type of sbi->s_session is "__s32", when the remaining capacity of the disc is less than 300M (take a set of test values: sector=3154903040, sbi->s_session=1540464, sb->s_blocksize_bits=11 ), the calculation result of "sbi->s_session << sb->s_blocksize_bits" will overflow. Therefore, it is necessary to convert the type of s_session to "loff_t" (when udf_check_vsd starts, assign a value to _sector, which is also converted in this way), so that the result will not overflow, and then the content of the disc can be displayed normally. Link: https://lore.kernel.org/r/20210114075741.30448-1-changlianzhi@uniontech.com Signed-off-by: lianzhi chang <changlianzhi@uniontech.com> Signed-off-by: Jan Kara <jack@suse.cz>
2021-01-17fs/cifs: Simplify bool comparison.Jiapeng Zhong
Fix the follow warnings: ./fs/cifs/connect.c: WARNING: Comparison of 0/1 to bool variable Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Zhong <abaci-bugfix@linux.alibaba.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-01-17fs/cifs: Assign boolean values to a bool variableJiapeng Zhong
Fix the following coccicheck warnings: ./fs/cifs/connect.c:3386:2-21: WARNING: Assignment of 0/1 to bool variable. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Zhong <abaci-bugfix@linux.alibaba.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-01-17Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds
Pull misc vfs fixes from Al Viro: "Several assorted fixes. I still think that audit ->d_name race is better fixed this way for the benefit of backports, with any possibly fancier variants done on top of it" * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: dump_common_audit_data(): fix racy accesses to ->d_name iov_iter: fix the uaccess area in copy_compat_iovec_from_user umount(2): move the flag validity checks first
2021-01-16io_uring: fix skipping disabling sqo on execPavel Begunkov
If there are no requests at the time __io_uring_task_cancel() is called, tctx_inflight() returns zero and and it terminates not getting a chance to go through __io_uring_files_cancel() and do io_disable_sqo_submit(). And we absolutely want them disabled by the time cancellation ends. Reported-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Fixes: d9d05217cb69 ("io_uring: stop SQPOLL submit on creator's death") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-16io_uring: fix uring_flush in exit_files() warningPavel Begunkov
WARNING: CPU: 1 PID: 11100 at fs/io_uring.c:9096 io_uring_flush+0x326/0x3a0 fs/io_uring.c:9096 RIP: 0010:io_uring_flush+0x326/0x3a0 fs/io_uring.c:9096 Call Trace: filp_close+0xb4/0x170 fs/open.c:1280 close_files fs/file.c:401 [inline] put_files_struct fs/file.c:416 [inline] put_files_struct+0x1cc/0x350 fs/file.c:413 exit_files+0x7e/0xa0 fs/file.c:433 do_exit+0xc22/0x2ae0 kernel/exit.c:820 do_group_exit+0x125/0x310 kernel/exit.c:922 get_signal+0x3e9/0x20a0 kernel/signal.c:2770 arch_do_signal_or_restart+0x2a8/0x1eb0 arch/x86/kernel/signal.c:811 handle_signal_work kernel/entry/common.c:147 [inline] exit_to_user_mode_loop kernel/entry/common.c:171 [inline] exit_to_user_mode_prepare+0x148/0x250 kernel/entry/common.c:201 __syscall_exit_to_user_mode_work kernel/entry/common.c:291 [inline] syscall_exit_to_user_mode+0x19/0x50 kernel/entry/common.c:302 entry_SYSCALL_64_after_hwframe+0x44/0xa9 An SQPOLL ring creator task may have gotten rid of its file note during exit and called io_disable_sqo_submit(), but the io_uring is still left referenced through fdtable, which will be put during close_files() and cause a false positive warning. First split the warning into two for more clarity when is hit, and the add sqo_dead check to handle the described case. Reported-by: syzbot+a32b546d58dde07875a1@syzkaller.appspotmail.com Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-16io_uring: fix false positive sqo warning on flushPavel Begunkov
WARNING: CPU: 1 PID: 9094 at fs/io_uring.c:8884 io_disable_sqo_submit+0x106/0x130 fs/io_uring.c:8884 Call Trace: io_uring_flush+0x28b/0x3a0 fs/io_uring.c:9099 filp_close+0xb4/0x170 fs/open.c:1280 close_fd+0x5c/0x80 fs/file.c:626 __do_sys_close fs/open.c:1299 [inline] __se_sys_close fs/open.c:1297 [inline] __x64_sys_close+0x2f/0xa0 fs/open.c:1297 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 io_uring's final close() may be triggered by any task not only the creator. It's well handled by io_uring_flush() including SQPOLL case, though a warning in io_disable_sqo_submit() will fallaciously fire by moving this warning out to the only call site that matters. Reported-by: syzbot+2f5d1785dc624932da78@syzkaller.appspotmail.com Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-16io_uring: iopoll requests should also wake task ->in_idle stateJens Axboe
If we're freeing/finishing iopoll requests, ensure we check if the task is in idling in terms of cancelation. Otherwise we could end up waiting forever in __io_uring_task_cancel() if the task has active iopoll requests that need cancelation. Cc: stable@vger.kernel.org # 5.9+ Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-16Merge tag 'io_uring-5.11-2021-01-16' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring fixes from Jens Axboe: "We still have a pending fix for a cancelation issue, but it's still being investigated. In the meantime: - Dead mm handling fix (Pavel) - SQPOLL setup error handling (Pavel) - Flush timeout sequence fix (Marcelo) - Missing finish_wait() for one exit case" * tag 'io_uring-5.11-2021-01-16' of git://git.kernel.dk/linux-block: io_uring: ensure finish_wait() is always called in __io_uring_task_cancel() io_uring: flush timeouts that should already have expired io_uring: do sqo disable on install_fd error io_uring: fix null-deref in io_disable_sqo_submit io_uring: don't take files/mm for a dead task io_uring: drop mm and files after task_work_run
2021-01-16mm: don't play games with pinned pages in clear_page_refsLinus Torvalds
Turning a pinned page read-only breaks the pinning after COW. Don't do it. The whole "track page soft dirty" state doesn't work with pinned pages anyway, since the page might be dirtied by the pinning entity without ever being noticed in the page tables. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>