summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2023-06-09mm/gup: remove vmas parameter from get_user_pages_remote()Lorenzo Stoakes
The only instances of get_user_pages_remote() invocations which used the vmas parameter were for a single page which can instead simply look up the VMA directly. In particular:- - __update_ref_ctr() looked up the VMA but did nothing with it so we simply remove it. - __access_remote_vm() was already using vma_lookup() when the original lookup failed so by doing the lookup directly this also de-duplicates the code. We are able to perform these VMA operations as we already hold the mmap_lock in order to be able to call get_user_pages_remote(). As part of this work we add get_user_page_vma_remote() which abstracts the VMA lookup, error handling and decrementing the page reference count should the VMA lookup fail. This forms part of a broader set of patches intended to eliminate the vmas parameter altogether. [akpm@linux-foundation.org: avoid passing NULL to PTR_ERR] Link: https://lkml.kernel.org/r/d20128c849ecdbf4dd01cc828fcec32127ed939a.1684350871.git.lstoakes@gmail.com Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> (for arm64) Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> (for s390) Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Christian König <christian.koenig@amd.com> Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jarkko Sakkinen <jarkko@kernel.org> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Sean Christopherson <seanjc@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-09mm: pagemap: restrict pagewalk to the requested rangeYuanchu Xie
The pagewalk in pagemap_read reads one PTE past the end of the requested range, and stops when the buffer runs out of space. While it produces the right result, the extra read is unnecessary and less performant. I timed the following command before and after this patch: dd count=100000 if=/proc/self/pagemap of=/dev/null The results are consistently within 0.001s across 5 runs. Before: 100000+0 records in 100000+0 records out 51200000 bytes (51 MB) copied, 0.0763159 s, 671 MB/s real 0m0.078s user 0m0.012s sys 0m0.065s After: 100000+0 records in 100000+0 records out 51200000 bytes (51 MB) copied, 0.0487928 s, 1.0 GB/s real 0m0.050s user 0m0.011s sys 0m0.039s Link: https://lkml.kernel.org/r/20230515172608.3558391-1-yuanchu@google.com Signed-off-by: Yuanchu Xie <yuanchu@google.com> Acked-by: Peter Xu <peterx@redhat.com> Reviewed-by: Yang Shi <shy828301@gmail.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Zach O'Keefe <zokeefe@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-09fs: hugetlbfs: set vma policy only when needed for allocating folioAckerley Tng
Calling hugetlb_set_vma_policy() later avoids setting the vma policy and then dropping it on a page cache hit. Link: https://lkml.kernel.org/r/20230502235622.3652586-1-ackerleytng@google.com Signed-off-by: Ackerley Tng <ackerleytng@google.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Erdem Aktas <erdemaktas@google.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Vishal Annapurve <vannapurve@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-09writeback: move wb_over_bg_thresh() call outside lock sectionYosry Ahmed
Patch series "cgroup: eliminate atomic rstat flushing", v5. A previous patch series [1] changed most atomic rstat flushing contexts to become non-atomic. This was done to avoid an expensive operation that scales with # cgroups and # cpus to happen with irqs disabled and scheduling not permitted. There were two remaining atomic flushing contexts after that series. This series tries to eliminate them as well, eliminating atomic rstat flushing completely. The two remaining atomic flushing contexts are: (a) wb_over_bg_thresh()->mem_cgroup_wb_stats() (b) mem_cgroup_threshold()->mem_cgroup_usage() For (a), flushing needs to be atomic as wb_writeback() calls wb_over_bg_thresh() with a spinlock held. However, it seems like the call to wb_over_bg_thresh() doesn't need to be protected by that spinlock, so this series proposes a refactoring that moves the call outside the lock criticial section and makes the stats flushing in mem_cgroup_wb_stats() non-atomic. For (b), flushing needs to be atomic as mem_cgroup_threshold() is called with irqs disabled. We only flush the stats when calculating the root usage, as it is approximated as the sum of some memcg stats (file, anon, and optionally swap) instead of the conventional page counter. This series proposes changing this calculation to use the global stats instead, eliminating the need for a memcg stat flush. After these 2 contexts are eliminated, we no longer need mem_cgroup_flush_stats_atomic() or cgroup_rstat_flush_atomic(). We can remove them and simplify the code. [1] https://lore.kernel.org/linux-mm/20230330191801.1967435-1-yosryahmed@google.com/ This patch (of 5): wb_over_bg_thresh() calls mem_cgroup_wb_stats() which invokes an rstat flush, which can be expensive on large systems. Currently, wb_writeback() calls wb_over_bg_thresh() within a lock section, so we have to do the rstat flush atomically. On systems with a lot of cpus and/or cgroups, this can cause us to disable irqs for a long time, potentially causing problems. Move the call to wb_over_bg_thresh() outside the lock section in preparation to make the rstat flush in mem_cgroup_wb_stats() non-atomic. The list_empty(&wb->work_list) check should be okay outside the lock section of wb->list_lock as it is protected by a separate lock (wb->work_lock), and wb_over_bg_thresh() doesn't seem like it is modifying any of wb->b_* lists the wb->list_lock is protecting. Also, the loop seems to be already releasing and reacquring the lock, so this refactoring looks safe. Link: https://lkml.kernel.org/r/20230421174020.2994750-1-yosryahmed@google.com Link: https://lkml.kernel.org/r/20230421174020.2994750-2-yosryahmed@google.com Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Reviewed-by: Michal Koutný <mkoutny@suse.com> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Shakeel Butt <shakeelb@google.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-05-26Merge tag 'for-6.4-rc3-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - handle memory allocation error in checksumming helper (reported by syzbot) - fix lockdep splat when aborting a transaction, add NOFS protection around invalidate_inode_pages2 that could allocate with GFP_KERNEL - reduce chances to hit an ENOSPC during scrub with RAID56 profiles * tag 'for-6.4-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: use nofs when cleaning up aborted transactions btrfs: handle memory allocation failure in btrfs_csum_one_bio btrfs: scrub: try harder to mark RAID56 block groups read-only
2023-05-25Merge tag '6.4-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull smb directory moves and client fixes from Steve French: "Four smb3 client fixes (three of which marked for stable) and three patches to move of fs/cifs and fs/ksmbd to a new common "fs/smb" parent directory - Move the client and server source directories to a common parent directory: fs/cifs -> fs/smb/client fs/ksmbd -> fs/smb/server fs/smbfs_common -> fs/smb/common - important readahead fix - important fix for SMB1 regression - fix for missing mount option ("mapchars") in mount API conversion - minor debugging improvement" * tag '6.4-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: smb3: move Documentation/filesystems/cifs to Documentation/filesystems/smb cifs: correct references in Documentation to old fs/cifs path smb: move client and server files to common directory fs/smb cifs: mapchars mount option ignored smb3: display debug information better for encryption cifs: fix smb1 mount regression cifs: Fix cifs_limit_bvec_subset() to correctly check the maxmimum size
2023-05-25Merge tag 'vfs/v6.4-rc3/misc.fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - During the acl rework we merged this cycle the generic_listxattr() helper had to be modified in a way that in principle it would allow for POSIX ACLs to be reported. At least that was the impression we had initially. Because before the acl rework POSIX ACLs would be reported if the filesystem did have POSIX ACL xattr handlers in sb->s_xattr. That logic changed and now we can simply check whether the superblock has SB_POSIXACL set and if the inode has inode->i_{default_}acl set report the appropriate POSIX ACL name. However, we didn't realize that generic_listxattr() was only ever used by two filesystems. Both of them don't support POSIX ACLs via sb->s_xattr handlers and so never reported POSIX ACLs via generic_listxattr() even if they raised SB_POSIXACL and did contain inodes which had acls set. The example here is nfs4. As a result, generic_listxattr() suddenly started reporting POSIX ACLs when it wouldn't have before. Since SB_POSIXACL implies that the umask isn't stripped in the VFS nfs4 can't just drop SB_POSIXACL from the superblock as it would also alter umask handling for them. So just have generic_listxattr() not report POSIX ACLs as it never did anyway. It's documented as such. - Our SB_* flags currently use a signed integer and we shift the last bit causing UBSAN to complain about undefined behavior. Switch to using unsigned. While the original patch used an explicit unsigned bitshift it's now pretty common to rely on the BIT() macro in a lot of headers nowadays. So the patch has been adjusted to use that. - Add Namjae as ntfs reviewer. They're already active this cycle so let's make it explicit right now. * tag 'vfs/v6.4-rc3/misc.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: ntfs: Add myself as a reviewer fs: don't call posix_acl_listxattr in generic_listxattr fs: fix undefined behavior in bit shift for SB_NOUSER
2023-05-24cifs: correct references in Documentation to old fs/cifs pathSteve French
The fs/cifs directory has moved to fs/smb/client, correct mentions of this in Documentation and comments. Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-05-24smb: move client and server files to common directory fs/smbSteve French
Move CIFS/SMB3 related client and server files (cifs.ko and ksmbd.ko and helper modules) to new fs/smb subdirectory: fs/cifs --> fs/smb/client fs/ksmbd --> fs/smb/server fs/smbfs_common --> fs/smb/common Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-05-24cifs: mapchars mount option ignoredSteve French
There are two ways that special characters (not allowed in some other operating systems like Windows, but allowed in POSIX) have been mapped in the past ("SFU" and "SFM" mappings) to allow them to be stored in a range reserved for special chars. The default for Linux has been to use "mapposix" (ie the SFM mapping) but the conversion to the new mount API in the 5.11 kernel broke the ability to override the default mapping of the reserved characters (like '?' and '*' and '\') via "mapchars" mount option. This patch fixes that - so can now mount with "mapchars" mount option to override the default ("mapposix" ie SFM) mapping. Reported-by: Tyler Spivey <tspivey8@gmail.com> Fixes: 24e0a1eff9e2 ("cifs: switch to new mount api") Signed-off-by: Steve French <stfrench@microsoft.com>
2023-05-24smb3: display debug information better for encryptionSteve French
Fix /proc/fs/cifs/DebugData to use the same case for "encryption" (ie "Encryption" with init capital letter was used in one place). In addition, if gcm256 encryption (intead of gcm128) is used on a connection to a server, note that in the DebugData as well. It now displays (when gcm256 negotiated): Security type: RawNTLMSSP SessionId: 0x86125800bc000b0d encrypted(gcm256) Acked-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-05-24cifs: fix smb1 mount regressionPaulo Alcantara
cifs.ko maps NT_STATUS_NOT_FOUND to -EIO when SMB1 servers couldn't resolve referral paths. Proceed to tree connect when we get -EIO from dfs_get_referral() as well. Reported-by: Kris Karas (Bug Reporting) <bugs-a21@moonlit-rail.com> Tested-by: Woody Suwalski <terraluna977@gmail.com> Fixes: 8e3554150d6c ("cifs: fix sharing of DFS connections") Cc: stable@vger.kernel.org # v6.2+ Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-05-23cifs: Fix cifs_limit_bvec_subset() to correctly check the maxmimum sizeDavid Howells
Fix cifs_limit_bvec_subset() so that it limits the span to the maximum specified and won't return with a size greater than max_size. Fixes: d08089f649a0 ("cifs: Change the I/O paths to use an iterator rather than a page list") Cc: stable@vger.kernel.org # 6.3 Reported-by: Shyam Prasad N <sprasad@microsoft.com> Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Steve French <smfrench@gmail.com> cc: Rohith Surabattula <rohiths.msft@gmail.com> cc: Paulo Alcantara <pc@manguebit.com> cc: Tom Talpey <tom@talpey.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org cc: linux-fsdevel@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2023-05-23Merge tag 'erofs-for-6.4-rc4-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs fixes from Gao Xiang: "One patch addresses a null-ptr-deref issue reported by syzbot weeks ago, which is caused by the new long xattr name prefix feature and needs to be fixed. The remaining two patches are minor cleanups to avoid unnecessary compilation and adjust per-cpu kworker configuration. Summary: - Fix null-ptr-deref related to long xattr name prefixes - Avoid pcpubuf compilation if CONFIG_EROFS_FS_ZIP is off - Use high priority kthreads by default if per-cpu kthread workers are enabled" * tag 'erofs-for-6.4-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: use HIPRI by default if per-cpu kthreads are enabled erofs: avoid pcpubuf.c inclusion if CONFIG_EROFS_FS_ZIP is off erofs: fix null-ptr-deref caused by erofs_xattr_prefixes_init
2023-05-23erofs: use HIPRI by default if per-cpu kthreads are enabledGao Xiang
As Sandeep shown [1], high priority RT per-cpu kthreads are typically helpful for Android scenarios to minimize the scheduling latencies. Switch EROFS_FS_PCPU_KTHREAD_HIPRI on by default if EROFS_FS_PCPU_KTHREAD is on since it's the typical use cases for EROFS_FS_PCPU_KTHREAD. Also clean up unneeded sched_set_normal(). [1] https://lore.kernel.org/r/CAB=BE-SBtO6vcoyLNA9F-9VaN5R0t3o_Zn+FW8GbO6wyUqFneQ@mail.gmail.com Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Sandeep Dhavale <dhavale@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20230522092141.124290-1-hsiangkao@linux.alibaba.com
2023-05-23erofs: avoid pcpubuf.c inclusion if CONFIG_EROFS_FS_ZIP is offYue Hu
The function of pcpubuf.c is just for low-latency decompression algorithms (e.g. lz4). Signed-off-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20230515095758.10391-1-zbestahu@gmail.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-05-23erofs: fix null-ptr-deref caused by erofs_xattr_prefixes_initJingbo Xu
Fragments and dedupe share one feature bit, and thus packed inode may not exist when fragment feature bit (dedupe feature bit exactly) is set, e.g. when deduplication feature is in use while fragments feature is not. In this case, sbi->packed_inode could be NULL while fragments feature bit is set. Fix this by accessing packed inode only when it exists. Reported-by: syzbot+902d5a9373ae8f748a94@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=902d5a9373ae8f748a94 Reported-and-tested-by: syzbot+bbb353775d51424087f2@syzkaller.appspotmail.com Fixes: 9e382914617c ("erofs: add helpers to load long xattr name prefixes") Fixes: 6a318ccd7e08 ("erofs: enable long extended attribute name prefixes") Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20230515103941.129784-1-jefflexu@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-05-22Merge tag 'nfs-for-6.4-2' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds
Pull NFS client fixes from Anna Schumaker: "Stable Fix: - Don't change task->tk_status after the call to rpc_exit_task Other Bugfixes: - Convert kmap_atomic() to kmap_local_folio() - Fix a potential double free with READ_PLUS" * tag 'nfs-for-6.4-2' of git://git.linux-nfs.org/projects/anna/linux-nfs: NFSv4.2: Fix a potential double free with READ_PLUS SUNRPC: Don't change task->tk_status after the call to rpc_exit_task NFS: Convert kmap_atomic() to kmap_local_folio()
2023-05-21Merge tag '6.4-rc2-ksmbd-server-fixes' of git://git.samba.org/ksmbdLinus Torvalds
Pull ksmbd server fixes from Steve French: - two fixes for incorrect SMB3 message validation (one for client which uses 8 byte padding, and one for empty bcc) - two fixes for out of bounds bugs: one for username offset checks (in session setup) and the other for create context name length checks in open requests * tag '6.4-rc2-ksmbd-server-fixes' of git://git.samba.org/ksmbd: ksmbd: smb2: Allow messages padded to 8byte boundary ksmbd: allocate one more byte for implied bcc[0] ksmbd: fix wrong UserName check in session_user ksmbd: fix global-out-of-bounds in smb2_find_context_vals
2023-05-21Merge tag '6.4-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cifs client fixes from Steve French: "Two smb3 client fixes, both related to deferred close, and also for stable: - send close for deferred handles before not after lease break response to avoid possible sharing violations - check all opens on an inode (looking for deferred handles) when lease break is returned not just the handle the lease break came in on" * tag '6.4-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: SMB3: drop reference to cfile before sending oplock break SMB3: Close all deferred handles of inode in case of handle lease break
2023-05-19NFSv4.2: Fix a potential double free with READ_PLUSAnna Schumaker
kfree()-ing the scratch page isn't enough, we also need to set the pointer back to NULL to avoid a double-free in the case of a resend. Fixes: fbd2a05f29a9 (NFSv4.2: Rework scratch handling for READ_PLUS) Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2023-05-19NFS: Convert kmap_atomic() to kmap_local_folio()Fabio M. De Francesco
kmap_atomic() is deprecated in favor of kmap_local_{folio,page}(). Therefore, replace kmap_atomic() with kmap_local_folio() in nfs_readdir_folio_array_append(). kmap_atomic() disables page-faults and preemption (the latter only for !PREEMPT_RT kernels), However, the code within the mapping/un-mapping in nfs_readdir_folio_array_append() does not depend on the above-mentioned side effects. Therefore, a mere replacement of the old API with the new one is all that is required (i.e., there is no need to explicitly add any calls to pagefault_disable() and/or preempt_disable()). Tested with (x)fstests in a QEMU/KVM x86_32 VM, 6GB RAM, booting a kernel with HIGHMEM64GB enabled. Cc: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Fixes: ec108d3cc766 ("NFS: Convert readdir page array functions to use a folio") Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2023-05-19Merge tag 'ceph-for-6.4-rc3' of https://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph fixes from Ilya Dryomov: "A workaround for a just discovered bug in MClientSnap encoding which goes back to 2017 (marked for stable) and a fixup to quieten a static checker" * tag 'ceph-for-6.4-rc3' of https://github.com/ceph/ceph-client: ceph: force updating the msg pointer in non-split case ceph: silence smatch warning in reconnect_caps_cb()
2023-05-19Merge tag 's390-6.4-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Alexander Gordeev: - Add check whether the required facilities are installed before using the s390-specific ChaCha20 implementation - Key blobs for s390 protected key interface IOCTLs commands PKEY_VERIFYKEY2 and PKEY_VERIFYKEY3 may contain clear key material. Zeroize copies of these keys in kernel memory after creating protected keys - Set CONFIG_INIT_STACK_NONE=y in defconfigs to avoid extra overhead of initializing all stack variables by default - Make sure that when a new channel-path is enabled all subchannels are evaluated: with and without any devices connected on it - When SMT thread CPUs are added to CPU topology masks the nr_cpu_ids limit is not checked and could be exceeded. Respect the nr_cpu_ids limit and avoid a warning when CONFIG_DEBUG_PER_CPU_MAPS is set - The pointer to IPL Parameter Information Block is stored in the absolute lowcore as a virtual address. Save it as the physical address for later use by dump tools - Fix a Queued Direct I/O (QDIO) problem on z/VM guests using QIOASSIST with dedicated (pass through) QDIO-based devices such as FCP, real OSA or HiperSockets - s390's struct statfs and struct statfs64 contain padding, which field-by-field copying does not set. Initialize the respective structures with zeros before filling them and copying to userspace - Grow s390 compat_statfs64, statfs and statfs64 structures f_spare array member to cover padding and simplify things - Remove obsolete SCHED_BOOK and SCHED_DRAWER configs - Remove unneeded S390_CCW_IOMMU and S390_AP_IOM configs * tag 's390-6.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/iommu: get rid of S390_CCW_IOMMU and S390_AP_IOMMU s390/Kconfig: remove obsolete configs SCHED_{BOOK,DRAWER} s390/uapi: cover statfs padding by growing f_spare statfs: enforce statfs[64] structure initialization s390/qdio: fix do_sqbs() inline assembly constraint s390/ipl: fix IPIB virtual vs physical address confusion s390/topology: honour nr_cpu_ids when adding CPUs s390/cio: include subchannels without devices also for evaluation s390/defconfigs: set CONFIG_INIT_STACK_NONE=y s390/pkey: zeroize key blobs s390/crypto: use vector instructions only if available for ChaCha20
2023-05-18Merge tag 'mm-hotfixes-stable-2023-05-18-15-52' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "Eight hotfixes. Four are cc:stable, the other four are for post-6.4 issues, or aren't considered suitable for backporting" * tag 'mm-hotfixes-stable-2023-05-18-15-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: MAINTAINERS: Cleanup Arm Display IP maintainers MAINTAINERS: repair pattern in DIALOG SEMICONDUCTOR DRIVERS nilfs2: fix use-after-free bug of nilfs_root in nilfs_evict_inode() mm: fix zswap writeback race condition mm: kfence: fix false positives on big endian zsmalloc: move LRU update from zs_map_object() to zs_malloc() mm: shrinkers: fix race condition on debugfs cleanup maple_tree: make maple state reusable after mas_empty_area()
2023-05-18ceph: force updating the msg pointer in non-split caseXiubo Li
When the MClientSnap reqeust's op is not CEPH_SNAP_OP_SPLIT the request may still contain a list of 'split_realms', and we need to skip it anyway. Or it will be parsed as a corrupt snaptrace. Cc: stable@vger.kernel.org Link: https://tracker.ceph.com/issues/61200 Reported-by: Frank Schilder <frans@dtu.dk> Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2023-05-18ceph: silence smatch warning in reconnect_caps_cb()Xiubo Li
Smatch static checker warning: fs/ceph/mds_client.c:3968 reconnect_caps_cb() warn: missing error code here? '__get_cap_for_mds()' failed. 'err' = '0' [ idryomov: Dan says that Smatch considers it intentional only if the "ret = 0;" assignment is within 4 or 5 lines of the goto. ] Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2023-05-17nilfs2: fix use-after-free bug of nilfs_root in nilfs_evict_inode()Ryusuke Konishi
During unmount process of nilfs2, nothing holds nilfs_root structure after nilfs2 detaches its writer in nilfs_detach_log_writer(). However, since nilfs_evict_inode() uses nilfs_root for some cleanup operations, it may cause use-after-free read if inodes are left in "garbage_list" and released by nilfs_dispose_list() at the end of nilfs_detach_log_writer(). Fix this issue by modifying nilfs_evict_inode() to only clear inode without additional metadata changes that use nilfs_root if the file system is degraded to read-only or the writer is detached. Link: https://lkml.kernel.org/r/20230509152956.8313-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+78d4495558999f55d1da@syzkaller.appspotmail.com Closes: https://lkml.kernel.org/r/00000000000099e5ac05fb1c3b85@google.com Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-05-17SMB3: drop reference to cfile before sending oplock breakBharath SM
In cifs_oplock_break function we drop reference to a cfile at the end of function, due to which close command goes on wire after lease break acknowledgment even if file is already closed by application but we had deferred the handle close. If other client with limited file shareaccess waiting on lease break ack proceeds operation on that file as soon as first client sends ack, then we may encounter status sharing violation error because of open handle. Solution is to put reference to cfile(send close on wire if last ref) and then send oplock acknowledgment to server. Fixes: 9e31678fb403 ("SMB3: fix lease break timeout when multiple deferred close handles for the same file.") Cc: stable@kernel.org Signed-off-by: Bharath SM <bharathsm@microsoft.com> Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-05-17SMB3: Close all deferred handles of inode in case of handle lease breakBharath SM
Oplock break may occur for different file handle than the deferred handle. Check for inode deferred closes list, if it's not empty then close all the deferred handles of inode because we should not cache handles if we dont have handle lease. Eg: If openfilelist has one deferred file handle and another open file handle from app for a same file, then on a lease break we choose the first handle in openfile list. The first handle in list can be deferred handle or actual open file handle from app. In case if it is actual open handle then today, we don't close deferred handles if we lose handle lease on a file. Problem with this is, later if app decides to close the existing open handle then we still be caching deferred handles until deferred close timeout. Leaving open handle may result in sharing violation when windows client tries to open a file with limited file share access. So we should check for deferred list of inode and walk through the list of deferred files in inode and close all deferred files. Fixes: 9e31678fb403 ("SMB3: fix lease break timeout when multiple deferred close handles for the same file.") Cc: stable@kernel.org Signed-off-by: Bharath SM <bharathsm@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-05-17Merge tag 'nfsd-6.4-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: - A collection of minor bug fixes * tag 'nfsd-6.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: NFSD: Remove open coding of string copy SUNRPC: Fix trace_svc_register() call site SUNRPC: always free ctxt when freeing deferred request SUNRPC: double free xprt_ctxt while still in use SUNRPC: Fix error handling in svc_setup_socket() SUNRPC: Fix encoding of accepted but unsuccessful RPC replies lockd: define nlm_port_min,max with CONFIG_SYSCTL nfsd: define exports_proc_ops with CONFIG_PROC_FS SUNRPC: Avoid relying on crypto API to derive CBC-CTS output IV
2023-05-17fs: don't call posix_acl_listxattr in generic_listxattrJeff Layton
Commit f2620f166e2a caused the kernel to start emitting POSIX ACL xattrs for NFSv4 inodes, which it doesn't support. The only other user of generic_listxattr is HFS (classic) and it doesn't support POSIX ACLs either. Fixes: f2620f166e2a xattr: simplify listxattr helpers Reported-by: Ondrej Valousek <ondrej.valousek.xm@renesas.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Message-Id: <20230516124655.82283-1-jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-05-17statfs: enforce statfs[64] structure initializationIlya Leoshkevich
s390's struct statfs and struct statfs64 contain padding, which field-by-field copying does not set. Initialize the respective structs with zeros before filling them and copying them to userspace, like it's already done for the compat versions of these structs. Found by KMSAN. [agordeev@linux.ibm.com: fixed typo in patch description] Acked-by: Heiko Carstens <hca@linux.ibm.com> Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Link: https://lore.kernel.org/r/20230504144021.808932-2-iii@linux.ibm.com Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-05-17btrfs: use nofs when cleaning up aborted transactionsJosef Bacik
Our CI system caught a lockdep splat: ====================================================== WARNING: possible circular locking dependency detected 6.3.0-rc7+ #1167 Not tainted ------------------------------------------------------ kswapd0/46 is trying to acquire lock: ffff8c6543abd650 (sb_internal#2){++++}-{0:0}, at: btrfs_commit_inode_delayed_inode+0x5f/0x120 but task is already holding lock: ffffffffabe61b40 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0x4aa/0x7a0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (fs_reclaim){+.+.}-{0:0}: fs_reclaim_acquire+0xa5/0xe0 kmem_cache_alloc+0x31/0x2c0 alloc_extent_state+0x1d/0xd0 __clear_extent_bit+0x2e0/0x4f0 try_release_extent_mapping+0x216/0x280 btrfs_release_folio+0x2e/0x90 invalidate_inode_pages2_range+0x397/0x470 btrfs_cleanup_dirty_bgs+0x9e/0x210 btrfs_cleanup_one_transaction+0x22/0x760 btrfs_commit_transaction+0x3b7/0x13a0 create_subvol+0x59b/0x970 btrfs_mksubvol+0x435/0x4f0 __btrfs_ioctl_snap_create+0x11e/0x1b0 btrfs_ioctl_snap_create_v2+0xbf/0x140 btrfs_ioctl+0xa45/0x28f0 __x64_sys_ioctl+0x88/0xc0 do_syscall_64+0x38/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc -> #0 (sb_internal#2){++++}-{0:0}: __lock_acquire+0x1435/0x21a0 lock_acquire+0xc2/0x2b0 start_transaction+0x401/0x730 btrfs_commit_inode_delayed_inode+0x5f/0x120 btrfs_evict_inode+0x292/0x3d0 evict+0xcc/0x1d0 inode_lru_isolate+0x14d/0x1e0 __list_lru_walk_one+0xbe/0x1c0 list_lru_walk_one+0x58/0x80 prune_icache_sb+0x39/0x60 super_cache_scan+0x161/0x1f0 do_shrink_slab+0x163/0x340 shrink_slab+0x1d3/0x290 shrink_node+0x300/0x720 balance_pgdat+0x35c/0x7a0 kswapd+0x205/0x410 kthread+0xf0/0x120 ret_from_fork+0x29/0x50 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(fs_reclaim); lock(sb_internal#2); lock(fs_reclaim); lock(sb_internal#2); *** DEADLOCK *** 3 locks held by kswapd0/46: #0: ffffffffabe61b40 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0x4aa/0x7a0 #1: ffffffffabe50270 (shrinker_rwsem){++++}-{3:3}, at: shrink_slab+0x113/0x290 #2: ffff8c6543abd0e0 (&type->s_umount_key#44){++++}-{3:3}, at: super_cache_scan+0x38/0x1f0 stack backtrace: CPU: 0 PID: 46 Comm: kswapd0 Not tainted 6.3.0-rc7+ #1167 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x58/0x90 check_noncircular+0xd6/0x100 ? save_trace+0x3f/0x310 ? add_lock_to_list+0x97/0x120 __lock_acquire+0x1435/0x21a0 lock_acquire+0xc2/0x2b0 ? btrfs_commit_inode_delayed_inode+0x5f/0x120 start_transaction+0x401/0x730 ? btrfs_commit_inode_delayed_inode+0x5f/0x120 btrfs_commit_inode_delayed_inode+0x5f/0x120 btrfs_evict_inode+0x292/0x3d0 ? lock_release+0x134/0x270 ? __pfx_wake_bit_function+0x10/0x10 evict+0xcc/0x1d0 inode_lru_isolate+0x14d/0x1e0 __list_lru_walk_one+0xbe/0x1c0 ? __pfx_inode_lru_isolate+0x10/0x10 ? __pfx_inode_lru_isolate+0x10/0x10 list_lru_walk_one+0x58/0x80 prune_icache_sb+0x39/0x60 super_cache_scan+0x161/0x1f0 do_shrink_slab+0x163/0x340 shrink_slab+0x1d3/0x290 shrink_node+0x300/0x720 balance_pgdat+0x35c/0x7a0 kswapd+0x205/0x410 ? __pfx_autoremove_wake_function+0x10/0x10 ? __pfx_kswapd+0x10/0x10 kthread+0xf0/0x120 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x29/0x50 </TASK> This happens because when we abort the transaction in the transaction commit path we call invalidate_inode_pages2_range on our block group cache inodes (if we have space cache v1) and any delalloc inodes we may have. The plain invalidate_inode_pages2_range() call passes through GFP_KERNEL, which makes sense in most cases, but not here. Wrap these two invalidate callees with memalloc_nofs_save/memalloc_nofs_restore to make sure we don't end up with the fs reclaim dependency under the transaction dependency. CC: stable@vger.kernel.org # 4.14+ Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-17btrfs: handle memory allocation failure in btrfs_csum_one_bioJohannes Thumshirn
Since f8a53bb58ec7 ("btrfs: handle checksum generation in the storage layer") the failures of btrfs_csum_one_bio() are handled via bio_end_io(). This means, we can return BLK_STS_RESOURCE from btrfs_csum_one_bio() in case the allocation of the ordered sums fails. This also fixes a syzkaller report, where injecting a failure into the kvzalloc() call results in a BUG_ON(). Reported-by: syzbot+d8941552e21eac774778@syzkaller.appspotmail.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-17btrfs: scrub: try harder to mark RAID56 block groups read-onlyQu Wenruo
Currently we allow a block group not to be marked read-only for scrub. But for RAID56 block groups if we require the block group to be read-only, then we're allowed to use cached content from scrub stripe to reduce unnecessary RAID56 reads. So this patch would: - Make btrfs_inc_block_group_ro() try harder During my tests, for cases like btrfs/061 and btrfs/064, we can hit ENOSPC from btrfs_inc_block_group_ro() calls during scrub. The reason is if we only have one single data chunk, and trying to scrub it, we won't have any space left for any newer data writes. But this check should be done by the caller, especially for scrub cases we only temporarily mark the chunk read-only. And newer data writes would always try to allocate a new data chunk when needed. - Return error for scrub if we failed to mark a RAID56 chunk read-only Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-16ksmbd: smb2: Allow messages padded to 8byte boundaryGustav Johansson
clc length is now accepted to <= 8 less than length, rather than < 8. Solve issues on some of Axis's smb clients which send messages where clc length is 8 bytes less than length. The specific client was running kernel 4.19.217 with smb dialect 3.0.2 on armv7l. Cc: stable@vger.kernel.org Signed-off-by: Gustav Johansson <gustajo@axis.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-05-16ksmbd: allocate one more byte for implied bcc[0]Chih-Yen Chang
ksmbd_smb2_check_message allows client to return one byte more, so we need to allocate additional memory in ksmbd_conn_handler_loop to avoid out-of-bound access. Cc: stable@vger.kernel.org Signed-off-by: Chih-Yen Chang <cc85nod@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-05-16ksmbd: fix wrong UserName check in session_userChih-Yen Chang
The offset of UserName is related to the address of security buffer. To ensure the validaty of UserName, we need to compare name_off + name_len with secbuf_len instead of auth_msg_len. [ 27.096243] ================================================================== [ 27.096890] BUG: KASAN: slab-out-of-bounds in smb_strndup_from_utf16+0x188/0x350 [ 27.097609] Read of size 2 at addr ffff888005e3b542 by task kworker/0:0/7 ... [ 27.099950] Call Trace: [ 27.100194] <TASK> [ 27.100397] dump_stack_lvl+0x33/0x50 [ 27.100752] print_report+0xcc/0x620 [ 27.102305] kasan_report+0xae/0xe0 [ 27.103072] kasan_check_range+0x35/0x1b0 [ 27.103757] smb_strndup_from_utf16+0x188/0x350 [ 27.105474] smb2_sess_setup+0xaf8/0x19c0 [ 27.107935] handle_ksmbd_work+0x274/0x810 [ 27.108315] process_one_work+0x419/0x760 [ 27.108689] worker_thread+0x2a2/0x6f0 [ 27.109385] kthread+0x160/0x190 [ 27.110129] ret_from_fork+0x1f/0x30 [ 27.110454] </TASK> Cc: stable@vger.kernel.org Signed-off-by: Chih-Yen Chang <cc85nod@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-05-16ksmbd: fix global-out-of-bounds in smb2_find_context_valsChih-Yen Chang
Add tag_len argument in smb2_find_context_vals() to avoid out-of-bound read when create_context's name_len is larger than tag length. [ 7.995411] ================================================================== [ 7.995866] BUG: KASAN: global-out-of-bounds in memcmp+0x83/0xa0 [ 7.996248] Read of size 8 at addr ffffffff8258d940 by task kworker/0:0/7 ... [ 7.998191] Call Trace: [ 7.998358] <TASK> [ 7.998503] dump_stack_lvl+0x33/0x50 [ 7.998743] print_report+0xcc/0x620 [ 7.999458] kasan_report+0xae/0xe0 [ 7.999895] kasan_check_range+0x35/0x1b0 [ 8.000152] memcmp+0x83/0xa0 [ 8.000347] smb2_find_context_vals+0xf7/0x1e0 [ 8.000635] smb2_open+0x1df2/0x43a0 [ 8.006398] handle_ksmbd_work+0x274/0x810 [ 8.006666] process_one_work+0x419/0x760 [ 8.006922] worker_thread+0x2a2/0x6f0 [ 8.007429] kthread+0x160/0x190 [ 8.007946] ret_from_fork+0x1f/0x30 [ 8.008181] </TASK> Cc: stable@vger.kernel.org Signed-off-by: Chih-Yen Chang <cc85nod@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-05-15NFSD: Remove open coding of string copyAzeem Shaikh
Instead of open coding a __dynamic_array(), use the __string() and __assign_str() helper macros that exist for this kind of use case. Part of an effort to remove deprecated strlcpy() [1] completely from the kernel[2]. [1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy [2] https://github.com/KSPP/linux/issues/89 Fixes: 3c92fba557c6 ("NFSD: Enhance the nfsd_cb_setup tracepoint") Signed-off-by: Azeem Shaikh <azeemshaikh38@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-05-13Merge tag 'ext4_for_linus_stable' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "Some ext4 bug fixes (mostly to address Syzbot reports)" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: bail out of ext4_xattr_ibody_get() fails for any reason ext4: add bounds checking in get_max_inline_xattr_value_size() ext4: add indication of ro vs r/w mounts in the mount message ext4: fix deadlock when converting an inline directory in nojournal mode ext4: improve error recovery code paths in __ext4_remount() ext4: improve error handling from ext4_dirhash() ext4: don't clear SB_RDONLY when remounting r/w until quota is re-enabled ext4: check iomap type only if ext4_iomap_begin() does not fail ext4: avoid a potential slab-out-of-bounds in ext4_group_desc_csum ext4: fix data races when using cached status extents ext4: avoid deadlock in fs reclaim with page writeback ext4: fix invalid free tracking in ext4_xattr_move_to_block() ext4: remove a BUG_ON in ext4_mb_release_group_pa() ext4: allow ext4_get_group_info() to fail ext4: fix lockdep warning when enabling MMP ext4: fix WARNING in mb_find_extent
2023-05-13ext4: bail out of ext4_xattr_ibody_get() fails for any reasonTheodore Ts'o
In ext4_update_inline_data(), if ext4_xattr_ibody_get() fails for any reason, it's best if we just fail as opposed to stumbling on, especially if the failure is EFSCORRUPTED. Cc: stable@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-05-13ext4: add bounds checking in get_max_inline_xattr_value_size()Theodore Ts'o
Normally the extended attributes in the inode body would have been checked when the inode is first opened, but if someone is writing to the block device while the file system is mounted, it's possible for the inode table to get corrupted. Add bounds checking to avoid reading beyond the end of allocated memory if this happens. Reported-by: syzbot+1966db24521e5f6e23f7@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=1966db24521e5f6e23f7 Cc: stable@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-05-13ext4: add indication of ro vs r/w mounts in the mount messageTheodore Ts'o
Whether the file system is mounted read-only or read/write is more important than the quota mode, which we are already printing. Add the ro vs r/w indication since this can be helpful in debugging problems from the console log. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-05-13ext4: fix deadlock when converting an inline directory in nojournal modeTheodore Ts'o
In no journal mode, ext4_finish_convert_inline_dir() can self-deadlock by calling ext4_handle_dirty_dirblock() when it already has taken the directory lock. There is a similar self-deadlock in ext4_incvert_inline_data_nolock() for data files which we'll fix at the same time. A simple reproducer demonstrating the problem: mke2fs -Fq -t ext2 -O inline_data -b 4k /dev/vdc 64 mount -t ext4 -o dirsync /dev/vdc /vdc cd /vdc mkdir file0 cd file0 touch file0 touch file1 attr -s BurnSpaceInEA -V abcde . touch supercalifragilisticexpialidocious Cc: stable@kernel.org Link: https://lore.kernel.org/r/20230507021608.1290720-1-tytso@mit.edu Reported-by: syzbot+91dccab7c64e2850a4e5@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?id=ba84cc80a9491d65416bc7877e1650c87530fe8a Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-05-13ext4: improve error recovery code paths in __ext4_remount()Theodore Ts'o
If there are failures while changing the mount options in __ext4_remount(), we need to restore the old mount options. This commit fixes two problem. The first is there is a chance that we will free the old quota file names before a potential failure leading to a use-after-free. The second problem addressed in this commit is if there is a failed read/write to read-only transition, if the quota has already been suspended, we need to renable quota handling. Cc: stable@kernel.org Link: https://lore.kernel.org/r/20230506142419.984260-2-tytso@mit.edu Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-05-13ext4: improve error handling from ext4_dirhash()Theodore Ts'o
The ext4_dirhash() will *almost* never fail, especially when the hash tree feature was first introduced. However, with the addition of support of encrypted, casefolded file names, that function can most certainly fail today. So make sure the callers of ext4_dirhash() properly check for failures, and reflect the errors back up to their callers. Cc: stable@kernel.org Link: https://lore.kernel.org/r/20230506142419.984260-1-tytso@mit.edu Reported-by: syzbot+394aa8a792cb99dbc837@syzkaller.appspotmail.com Reported-by: syzbot+344aaa8697ebd232bfc8@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?id=db56459ea4ac4a676ae4b4678f633e55da005a9b Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-05-13ext4: don't clear SB_RDONLY when remounting r/w until quota is re-enabledTheodore Ts'o
When a file system currently mounted read/only is remounted read/write, if we clear the SB_RDONLY flag too early, before the quota is initialized, and there is another process/thread constantly attempting to create a directory, it's possible to trigger the WARN_ON_ONCE(dquot_initialize_needed(inode)); in ext4_xattr_block_set(), with the following stack trace: WARNING: CPU: 0 PID: 5338 at fs/ext4/xattr.c:2141 ext4_xattr_block_set+0x2ef2/0x3680 RIP: 0010:ext4_xattr_block_set+0x2ef2/0x3680 fs/ext4/xattr.c:2141 Call Trace: ext4_xattr_set_handle+0xcd4/0x15c0 fs/ext4/xattr.c:2458 ext4_initxattrs+0xa3/0x110 fs/ext4/xattr_security.c:44 security_inode_init_security+0x2df/0x3f0 security/security.c:1147 __ext4_new_inode+0x347e/0x43d0 fs/ext4/ialloc.c:1324 ext4_mkdir+0x425/0xce0 fs/ext4/namei.c:2992 vfs_mkdir+0x29d/0x450 fs/namei.c:4038 do_mkdirat+0x264/0x520 fs/namei.c:4061 __do_sys_mkdirat fs/namei.c:4076 [inline] __se_sys_mkdirat fs/namei.c:4074 [inline] __x64_sys_mkdirat+0x89/0xa0 fs/namei.c:4074 Cc: stable@kernel.org Link: https://lore.kernel.org/r/20230506142419.984260-1-tytso@mit.edu Reported-by: syzbot+6385d7d3065524c5ca6d@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?id=6513f6cb5cd6b5fc9f37e3bb70d273b94be9c34c Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-05-13ext4: check iomap type only if ext4_iomap_begin() does not failBaokun Li
When ext4_iomap_overwrite_begin() calls ext4_iomap_begin() map blocks may fail for some reason (e.g. memory allocation failure, bare disk write), and later because "iomap->type ! = IOMAP_MAPPED" triggers WARN_ON(). When ext4 iomap_begin() returns an error, it is normal that the type of iomap->type may not match the expectation. Therefore, we only determine if iomap->type is as expected when ext4_iomap_begin() is executed successfully. Cc: stable@kernel.org Reported-by: syzbot+08106c4b7d60702dbc14@syzkaller.appspotmail.com Link: https://lore.kernel.org/all/00000000000015760b05f9b4eee9@google.com Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230505132429.714648-1-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>