summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2024-10-04ext4: fix off by one issue in alloc_flex_gd()Baokun Li
Wesley reported an issue: ================================================================== EXT4-fs (dm-5): resizing filesystem from 7168 to 786432 blocks ------------[ cut here ]------------ kernel BUG at fs/ext4/resize.c:324! CPU: 9 UID: 0 PID: 3576 Comm: resize2fs Not tainted 6.11.0+ #27 RIP: 0010:ext4_resize_fs+0x1212/0x12d0 Call Trace: __ext4_ioctl+0x4e0/0x1800 ext4_ioctl+0x12/0x20 __x64_sys_ioctl+0x99/0xd0 x64_sys_call+0x1206/0x20d0 do_syscall_64+0x72/0x110 entry_SYSCALL_64_after_hwframe+0x76/0x7e ================================================================== While reviewing the patch, Honza found that when adjusting resize_bg in alloc_flex_gd(), it was possible for flex_gd->resize_bg to be bigger than flexbg_size. The reproduction of the problem requires the following: o_group = flexbg_size * 2 * n; o_size = (o_group + 1) * group_size; n_group: [o_group + flexbg_size, o_group + flexbg_size * 2) o_size = (n_group + 1) * group_size; Take n=0,flexbg_size=16 as an example: last:15 |o---------------|--------------n-| o_group:0 resize to n_group:30 The corresponding reproducer is: img=test.img rm -f $img truncate -s 600M $img mkfs.ext4 -F $img -b 1024 -G 16 8M dev=`losetup -f --show $img` mkdir -p /tmp/test mount $dev /tmp/test resize2fs $dev 248M Delete the problematic plus 1 to fix the issue, and add a WARN_ON_ONCE() to prevent the issue from happening again. [ Note: another reproucer which this commit fixes is: img=test.img rm -f $img truncate -s 25MiB $img mkfs.ext4 -b 4096 -E nodiscard,lazy_itable_init=0,lazy_journal_init=0 $img truncate -s 3GiB $img dev=`losetup -f --show $img` mkdir -p /tmp/test mount $dev /tmp/test resize2fs $dev 3G umount $dev losetup -d $dev -- TYT ] Reported-by: Wesley Hershberger <wesley.hershberger@canonical.com> Closes: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2081231 Reported-by: Stéphane Graber <stgraber@stgraber.org> Closes: https://lore.kernel.org/all/20240925143325.518508-1-aleksandr.mikhalitsyn@canonical.com/ Tested-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Tested-by: Eric Sandeen <sandeen@redhat.com> Fixes: 665d3e0af4d3 ("ext4: reduce unnecessary memory allocation in alloc_flex_gd()") Cc: stable@vger.kernel.org Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240927133329.1015041-1-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-10-04ext4: mark fc as ineligible using an handle in ext4_xattr_set()Luis Henriques (SUSE)
Calling ext4_fc_mark_ineligible() with a NULL handle is racy and may result in a fast-commit being done before the filesystem is effectively marked as ineligible. This patch moves the call to this function so that an handle can be used. If a transaction fails to start, then there's not point in trying to mark the filesystem as ineligible, and an error will eventually be returned to user-space. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Luis Henriques (SUSE) <luis.henriques@linux.dev> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240923104909.18342-3-luis.henriques@linux.dev Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
2024-10-04ext4: use handle to mark fc as ineligible in __track_dentry_update()Luis Henriques (SUSE)
Calling ext4_fc_mark_ineligible() with a NULL handle is racy and may result in a fast-commit being done before the filesystem is effectively marked as ineligible. This patch fixes the calls to this function in __track_dentry_update() by adding an extra parameter to the callback used in ext4_fc_track_template(). Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Luis Henriques (SUSE) <luis.henriques@linux.dev> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240923104909.18342-2-luis.henriques@linux.dev Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
2024-10-04nfsd/localio: fix nfsd_file tracepoints to handle NULL rqstpMike Snitzer
Otherwise nfsd_file_acquire, nfsd_file_insert_err, and nfsd_file_cons_err will hit a NULL pointer when they are enabled and LOCALIO used. Example trace output (note xid is 0x0 and LOCALIO flag set): nfsd_file_acquire: xid=0x0 inode=0000000069a1b2e7 may_flags=WRITE|LOCALIO ref=1 nf_flags=HASHED|GC nf_may=WRITE nf_file=0000000070123234 status=0 Fixes: c63f0e48febf ("nfsd: add nfsd_file_acquire_local()") Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-10-04Merge tag 'fsnotify_for_v6.12-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify fixes from Jan Kara: "Fixes for an inotify deadlock and a data race in fsnotify" * tag 'fsnotify_for_v6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: inotify: Fix possible deadlock in fsnotify_destroy_mark fsnotify: Avoid data race between fsnotify_recalc_mask() and fsnotify_object_watched()
2024-10-04Merge tag 'fs_for_v6.12-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull UDF fixes from Jan Kara: "A couple of UDF error handling fixes for issues spotted by syzbot" * tag 'fs_for_v6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: udf: fix uninit-value use in udf_get_fileshortad udf: refactor inode_bmap() to handle error udf: refactor udf_next_aext() to handle error udf: refactor udf_current_aext() to handle error
2024-10-04Merge tag 'ceph-for-6.12-rc2' of https://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph fixes from Ilya Dryomov: "A fix from Patrick for a variety of CephFS lockup scenarios caused by a regression in cap handling which sneaked in through the netfs helper library in 5.18 (marked for stable) and an unrelated one-line cleanup" * tag 'ceph-for-6.12-rc2' of https://github.com/ceph/ceph-client: ceph: fix cap ref leak via netfs init_request ceph: use struct_size() helper in __ceph_pool_perm_get()
2024-10-04Merge tag 'for-6.12-rc1-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - in incremental send, fix invalid clone operation for file that got its size decreased - fix __counted_by() annotation of send path cache entries, we do not store the terminating NUL - fix a longstanding bug in relocation (and quite hard to hit by chance), drop back reference cache that can get out of sync after transaction commit - wait for fixup worker kthread before finishing umount - add missing raid-stripe-tree extent for NOCOW files, zoned mode cannot have NOCOW files but RST is meant to be a standalone feature - handle transaction start error during relocation, avoid potential NULL pointer dereference of relocation control structure (reported by syzbot) - disable module-wide rate limiting of debug level messages - minor fix to tracepoint definition (reported by checkpatch.pl) * tag 'for-6.12-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: disable rate limiting when debug enabled btrfs: wait for fixup workers before stopping cleaner kthread during umount btrfs: fix a NULL pointer dereference when failed to start a new trasacntion btrfs: send: fix invalid clone operation for file that got its size decreased btrfs: tracepoints: end assignment with semicolon at btrfs_qgroup_extent event class btrfs: drop the backref cache during relocation if we commit btrfs: also add stripe entries for NOCOW writes btrfs: send: fix buffer overflow detection when copying path to cache entry
2024-10-04Merge tag 'v6.12-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull smb client fixes from Steve French: - statfs fix (e.g. when limited access to root directory of share) - special file handling fixes: fix packet validation to avoid buffer overflow for reparse points, fixes for symlink path parsing (one for reparse points, and one for SFU use case), and fix for cleanup after failed SET_REPARSE operation. - fix for SMB2.1 signing bug introduced by recent patch to NFS symlink path, and NFS reparse point validation - comment cleanup * tag 'v6.12-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: Do not convert delimiter when parsing NFS-style symlinks cifs: Validate content of NFS reparse point buffer cifs: Fix buffer overflow when parsing NFS reparse points smb: client: Correct typos in multiple comments across various files smb: client: use actual path when queryfs cifs: Remove intermediate object of failed create reparse call Revert "smb: client: make SHA-512 TFM ephemeral" smb: Update comments about some reparse point tags cifs: Check for UTF-16 null codepoint in SFU symlink target location
2024-10-04Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds
Pull close_range() fix from Al Viro: "Fix the logic in descriptor table trimming" * tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: close_range(): fix the logics in descriptor table trimming
2024-10-03Merge tag 'pull-fixes.ufs' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull ufs fix from Al Viro: "Fix ufs_rename() braino introduced this cycle. The 'folio_release_kmap(dir_folio, new_dir)' in ufs_rename() part of folio conversion should've been getting a pointer to ufs directory entry within the page, rather than a pointer to directory struct inode..." * tag 'pull-fixes.ufs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: ufs_rename(): fix bogus argument of folio_release_kmap()
2024-10-03nfs_common: fix Kconfig for NFS_COMMON_LOCALIO_SUPPORTMike Snitzer
The 'default n' that was in NFS_COMMON_LOCALIO_SUPPORT caused these extra defaults to be missed: default y if NFSD=y || NFS_FS=y default m if NFSD=m && NFS_FS=m Remove the 'default n' for NFS_COMMON_LOCALIO_SUPPORT so that the correct tristate is selected based on how NFSD and NFS_FS are configured. This fixes the reported case where NFS_FS=y but NFS_COMMON_LOCALIO_SUPPORT=m, it is now correctly set to =y. In addition, add extra 'depends on NFS_LOCALIO' to NFS_COMMON_LOCALIO_SUPPORT so that if NFS_LOCALIO isn't set then NFS_COMMON_LOCALIO_SUPPORT will not be either. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202410031944.hMCFY9BO-lkp@intel.com/ Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-10-03nfs_common: fix race in NFS calls to nfsd_file_put_local() and nfsd_serv_put()Mike Snitzer
Add nfs_to_nfsd_file_put_local() interface to fix race with nfsd module unload. Similarly, use RCU around nfs_open_local_fh()'s error path call to nfs_to->nfsd_serv_put(). Holding RCU ensures that NFS will safely _call and return_ from its nfs_to calls into the NFSD functions nfsd_file_put_local() and nfsd_serv_put(). Otherwise, if RCU isn't used then there is a narrow window when NFS's reference for the nfsd_file and nfsd_serv are dropped and the NFSD module could be unloaded, which could result in a crash from the return instruction for either nfs_to->nfsd_file_put_local() or nfs_to->nfsd_serv_put(). Reported-by: NeilBrown <neilb@suse.de> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-10-03NFSv4: Prevent NULL-pointer dereference in nfs42_complete_copies()Yanjun Zhang
On the node of an NFS client, some files saved in the mountpoint of the NFS server were copied to another location of the same NFS server. Accidentally, the nfs42_complete_copies() got a NULL-pointer dereference crash with the following syslog: [232064.838881] NFSv4: state recovery failed for open file nfs/pvc-12b5200d-cd0f-46a3-b9f0-af8f4fe0ef64.qcow2, error = -116 [232064.839360] NFSv4: state recovery failed for open file nfs/pvc-12b5200d-cd0f-46a3-b9f0-af8f4fe0ef64.qcow2, error = -116 [232066.588183] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000058 [232066.588586] Mem abort info: [232066.588701] ESR = 0x0000000096000007 [232066.588862] EC = 0x25: DABT (current EL), IL = 32 bits [232066.589084] SET = 0, FnV = 0 [232066.589216] EA = 0, S1PTW = 0 [232066.589340] FSC = 0x07: level 3 translation fault [232066.589559] Data abort info: [232066.589683] ISV = 0, ISS = 0x00000007 [232066.589842] CM = 0, WnR = 0 [232066.589967] user pgtable: 64k pages, 48-bit VAs, pgdp=00002000956ff400 [232066.590231] [0000000000000058] pgd=08001100ae100003, p4d=08001100ae100003, pud=08001100ae100003, pmd=08001100b3c00003, pte=0000000000000000 [232066.590757] Internal error: Oops: 96000007 [#1] SMP [232066.590958] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm vhost_net vhost vhost_iotlb tap tun ipt_rpfilter xt_multiport ip_set_hash_ip ip_set_hash_net xfrm_interface xfrm6_tunnel tunnel4 tunnel6 esp4 ah4 wireguard libcurve25519_generic veth xt_addrtype xt_set nf_conntrack_netlink ip_set_hash_ipportnet ip_set_hash_ipportip ip_set_bitmap_port ip_set_hash_ipport dummy ip_set ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs iptable_filter sch_ingress nfnetlink_cttimeout vport_gre ip_gre ip_tunnel gre vport_geneve geneve vport_vxlan vxlan ip6_udp_tunnel udp_tunnel openvswitch nf_conncount dm_round_robin dm_service_time dm_multipath xt_nat xt_MASQUERADE nft_chain_nat nf_nat xt_mark xt_conntrack xt_comment nft_compat nft_counter nf_tables nfnetlink ocfs2 ocfs2_nodemanager ocfs2_stackglue iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ipmi_ssif nbd overlay 8021q garp mrp bonding tls rfkill sunrpc ext4 mbcache jbd2 [232066.591052] vfat fat cas_cache cas_disk ses enclosure scsi_transport_sas sg acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler ip_tables vfio_pci vfio_pci_core vfio_virqfd vfio_iommu_type1 vfio dm_mirror dm_region_hash dm_log dm_mod nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter bridge stp llc fuse xfs libcrc32c ast drm_vram_helper qla2xxx drm_kms_helper syscopyarea crct10dif_ce sysfillrect ghash_ce sysimgblt sha2_ce fb_sys_fops cec sha256_arm64 sha1_ce drm_ttm_helper ttm nvme_fc igb sbsa_gwdt nvme_fabrics drm nvme_core i2c_algo_bit i40e scsi_transport_fc megaraid_sas aes_neon_bs [232066.596953] CPU: 6 PID: 4124696 Comm: 10.253.166.125- Kdump: loaded Not tainted 5.15.131-9.cl9_ocfs2.aarch64 #1 [232066.597356] Hardware name: Great Wall .\x93\x8e...RF6260 V5/GWMSSE2GL1T, BIOS T656FBE_V3.0.18 2024-01-06 [232066.597721] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [232066.598034] pc : nfs4_reclaim_open_state+0x220/0x800 [nfsv4] [232066.598327] lr : nfs4_reclaim_open_state+0x12c/0x800 [nfsv4] [232066.598595] sp : ffff8000f568fc70 [232066.598731] x29: ffff8000f568fc70 x28: 0000000000001000 x27: ffff21003db33000 [232066.599030] x26: ffff800005521ae0 x25: ffff0100f98fa3f0 x24: 0000000000000001 [232066.599319] x23: ffff800009920008 x22: ffff21003db33040 x21: ffff21003db33050 [232066.599628] x20: ffff410172fe9e40 x19: ffff410172fe9e00 x18: 0000000000000000 [232066.599914] x17: 0000000000000000 x16: 0000000000000004 x15: 0000000000000000 [232066.600195] x14: 0000000000000000 x13: ffff800008e685a8 x12: 00000000eac0c6e6 [232066.600498] x11: 0000000000000000 x10: 0000000000000008 x9 : ffff8000054e5828 [232066.600784] x8 : 00000000ffffffbf x7 : 0000000000000001 x6 : 000000000a9eb14a [232066.601062] x5 : 0000000000000000 x4 : ffff70ff8a14a800 x3 : 0000000000000058 [232066.601348] x2 : 0000000000000001 x1 : 54dce46366daa6c6 x0 : 0000000000000000 [232066.601636] Call trace: [232066.601749] nfs4_reclaim_open_state+0x220/0x800 [nfsv4] [232066.601998] nfs4_do_reclaim+0x1b8/0x28c [nfsv4] [232066.602218] nfs4_state_manager+0x928/0x10f0 [nfsv4] [232066.602455] nfs4_run_state_manager+0x78/0x1b0 [nfsv4] [232066.602690] kthread+0x110/0x114 [232066.602830] ret_from_fork+0x10/0x20 [232066.602985] Code: 1400000d f9403f20 f9402e61 91016003 (f9402c00) [232066.603284] SMP: stopping secondary CPUs [232066.606936] Starting crashdump kernel... [232066.607146] Bye! Analysing the vmcore, we know that nfs4_copy_state listed by destination nfs_server->ss_copies was added by the field copies in handle_async_copy(), and we found a waiting copy process with the stack as: PID: 3511963 TASK: ffff710028b47e00 CPU: 0 COMMAND: "cp" #0 [ffff8001116ef740] __switch_to at ffff8000081b92f4 #1 [ffff8001116ef760] __schedule at ffff800008dd0650 #2 [ffff8001116ef7c0] schedule at ffff800008dd0a00 #3 [ffff8001116ef7e0] schedule_timeout at ffff800008dd6aa0 #4 [ffff8001116ef860] __wait_for_common at ffff800008dd166c #5 [ffff8001116ef8e0] wait_for_completion_interruptible at ffff800008dd1898 #6 [ffff8001116ef8f0] handle_async_copy at ffff8000055142f4 [nfsv4] #7 [ffff8001116ef970] _nfs42_proc_copy at ffff8000055147c8 [nfsv4] #8 [ffff8001116efa80] nfs42_proc_copy at ffff800005514cf0 [nfsv4] #9 [ffff8001116efc50] __nfs4_copy_file_range.constprop.0 at ffff8000054ed694 [nfsv4] The NULL-pointer dereference was due to nfs42_complete_copies() listed the nfs_server->ss_copies by the field ss_copies of nfs4_copy_state. So the nfs4_copy_state address ffff0100f98fa3f0 was offset by 0x10 and the data accessed through this pointer was also incorrect. Generally, the ordered list nfs4_state_owner->so_states indicate open(O_RDWR) or open(O_WRITE) states are reclaimed firstly by nfs4_reclaim_open_state(). When destination state reclaim is failed with NFS_STATE_RECOVERY_FAILED and copies are not deleted in nfs_server->ss_copies, the source state may be passed to the nfs42_complete_copies() process earlier, resulting in this crash scene finally. To solve this issue, we add a list_head nfs_server->ss_src_copies for a server-to-server copy specially. Fixes: 0e65a32c8a56 ("NFS: handle source server reboot") Signed-off-by: Yanjun Zhang <zhangyanjun@cestc.cn> Reviewed-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-10-03SUNRPC: Fix integer overflow in decode_rc_list()Dan Carpenter
The math in "rc_list->rcl_nrefcalls * 2 * sizeof(uint32_t)" could have an integer overflow. Add bounds checking on rc_list->rcl_nrefcalls to fix that. Fixes: 4aece6a19cf7 ("nfs41: cb_sequence xdr implementation") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-10-03cifs: Do not convert delimiter when parsing NFS-style symlinksPali Rohár
NFS-style symlinks have target location always stored in NFS/UNIX form where backslash means the real UNIX backslash and not the SMB path separator. So do not mangle slash and backslash content of NFS-style symlink during readlink() syscall as it is already in the correct Linux form. This fixes interoperability of NFS-style symlinks with backslashes created by Linux NFS3 client throw Windows NFS server and retrieved by Linux SMB client throw Windows SMB server, where both Windows servers exports the same directory. Fixes: d5ecebc4900d ("smb3: Allow query of symlinks stored as reparse points") Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-10-03cifs: Validate content of NFS reparse point bufferPali Rohár
Symlink target location stored in DataBuffer is encoded in UTF-16. So check that symlink DataBuffer length is non-zero and even number. And check that DataBuffer does not contain UTF-16 null codepoint because Linux cannot process symlink with null byte. DataBuffer for char and block devices is 8 bytes long as it contains two 32-bit numbers (major and minor). Add check for this. DataBuffer buffer for sockets and fifos zero-length. Add checks for this. Signed-off-by: Pali Rohár <pali@kernel.org> Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-10-03cifs: Fix buffer overflow when parsing NFS reparse pointsPali Rohár
ReparseDataLength is sum of the InodeType size and DataBuffer size. So to get DataBuffer size it is needed to subtract InodeType's size from ReparseDataLength. Function cifs_strndup_from_utf16() is currentlly accessing buf->DataBuffer at position after the end of the buffer because it does not subtract InodeType size from the length. Fix this problem and correctly subtract variable len. Member InodeType is present only when reparse buffer is large enough. Check for ReparseDataLength before accessing InodeType to prevent another invalid memory access. Major and minor rdev values are present also only when reparse buffer is large enough. Check for reparse buffer size before calling reparse_mkdev(). Fixes: d5ecebc4900d ("smb3: Allow query of symlinks stored as reparse points") Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-10-03Merge tag 'v6.12-rc1-ksmbd-fixes' of git://git.samba.org/ksmbdLinus Torvalds
Pull smb server fixes from Steve French: - small cleanup patches leveraging struct size to improve access bounds checking * tag 'v6.12-rc1-ksmbd-fixes' of git://git.samba.org/ksmbd: ksmbd: Use struct_size() to improve smb_direct_rdma_xmit() ksmbd: Annotate struct copychunk_ioctl_req with __counted_by_le() ksmbd: Use struct_size() to improve get_file_alternate_info()
2024-10-03Merge tag 'vfs-6.12-rc2.fixes.2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: "vfs: - Ensure that iter_folioq_get_pages() advances to the next slot otherwise it will end up using the same folio with an out-of-bound offset. iomap: - Dont unshare delalloc extents which can't be reflinked, and thus can't be shared. - Constrain the file range passed to iomap_file_unshare() directly in iomap instead of requiring the callers to do it. netfs: - Use folioq_count instead of folioq_nr_slot to prevent an unitialized value warning in netfs_clear_buffer(). - Fix missing wakeup after issuing writes by scheduling the write collector only if all the subrequest queues are empty and thus no writes are pending. - Fix two minor documentation bugs" * tag 'vfs-6.12-rc2.fixes.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: iomap: constrain the file range passed to iomap_file_unshare iomap: don't bother unsharing delalloc extents netfs: Fix missing wakeup after issuing writes Documentation: add missing folio_queue entry folio_queue: fix documentation netfs: Fix a KMSAN uninit-value error in netfs_clear_buffer iov_iter: fix advancing slot in iter_folioq_get_pages()
2024-10-03iomap: constrain the file range passed to iomap_file_unshareDarrick J. Wong
File contents can only be shared (i.e. reflinked) below EOF, so it makes no sense to try to unshare ranges beyond EOF. Constrain the file range parameters here so that we don't have to do that in the callers. Fixes: 5f4e5752a8a3 ("fs: add iomap_file_dirty") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20241002150213.GC21853@frogsfrogsfrogs Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-03iomap: don't bother unsharing delalloc extentsDarrick J. Wong
If unshare encounters a delalloc reservation in the srcmap, that means that the file range isn't shared because delalloc reservations cannot be reflinked. Therefore, don't try to unshare them. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20241002150040.GB21853@frogsfrogsfrogs Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-03ceph: fix cap ref leak via netfs init_requestPatrick Donnelly
Log recovered from a user's cluster: <7>[ 5413.970692] ceph: get_cap_refs 00000000958c114b ret 1 got Fr <7>[ 5413.970695] ceph: start_read 00000000958c114b, no cache cap ... <7>[ 5473.934609] ceph: my wanted = Fr, used = Fr, dirty - <7>[ 5473.934616] ceph: revocation: pAsLsXsFr -> pAsLsXs (revoking Fr) <7>[ 5473.934632] ceph: __ceph_caps_issued 00000000958c114b cap 00000000f7784259 issued pAsLsXs <7>[ 5473.934638] ceph: check_caps 10000000e68.fffffffffffffffe file_want - used Fr dirty - flushing - issued pAsLsXs revoking Fr retain pAsLsXsFsr AUTHONLY NOINVAL FLUSH_FORCE The MDS subsequently complains that the kernel client is late releasing caps. Approximately, a series of changes to this code by commits 49870056005c ("ceph: convert ceph_readpages to ceph_readahead"), 2de160417315 ("netfs: Change ->init_request() to return an error code") and a5c9dc445139 ("ceph: Make ceph_init_request() check caps on readahead") resulted in subtle resource cleanup to be missed. The main culprit is the change in error handling in 2de160417315 which meant that a failure in init_request() would no longer cause cleanup to be called. That would prevent the ceph_put_cap_refs() call which would cleanup the leaked cap ref. Cc: stable@vger.kernel.org Fixes: a5c9dc445139 ("ceph: Make ceph_init_request() check caps on readahead") Link: https://tracker.ceph.com/issues/67008 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2024-10-03ceph: use struct_size() helper in __ceph_pool_perm_get()Thorsten Blum
Use struct_size() to calculate the number of bytes to be allocated. Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2024-10-02bcachefs: Fix trans_commit disk accounting revertKent Overstreet
We only are applying JSET_ENTRY_TYPE_write_buffer_keys, revert path was missed. Fixes: a3581ca35d2b ("bcachefs: Fix BCH_TRANS_COMMIT_skip_accounting_apply") Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-02bcachefs: Fix bch2_inode_is_open() checkKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-02bcachefs: Fix return type of dirent_points_to_inode_nowarn()Kent Overstreet
we're returning an error code now, not a bool Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-02Merge tag 'pull-work.unaligned' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull generic unaligned.h cleanups from Al Viro: "Get rid of architecture-specific <asm/unaligned.h> includes, replacing them with a single generic <linux/unaligned.h> header file. It's the second largest (after asm/io.h) class of asm/* includes, and all but two architectures actually end up using exact same file. Massage the remaining two (arc and parisc) to do the same and just move the thing to from asm-generic/unaligned.h to linux/unaligned.h" [ This is one of those things that we're better off doing outside the merge window, and would only cause extra conflict noise if it was in linux-next for the next release due to all the trivial #include line updates. Rip off the band-aid. - Linus ] * tag 'pull-work.unaligned' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: move asm/unaligned.h to linux/unaligned.h arc: get rid of private asm/unaligned.h parisc: get rid of private asm/unaligned.h
2024-10-02smb: client: Correct typos in multiple comments across various filesShen Lichuan
Fixed some confusing typos that were currently identified witch codespell, the details are as follows: -in the code comments: fs/smb/client/cifsacl.h:58: inheritence ==> inheritance fs/smb/client/cifsencrypt.c:242: origiginal ==> original fs/smb/client/cifsfs.c:164: referece ==> reference fs/smb/client/cifsfs.c:292: ned ==> need fs/smb/client/cifsglob.h:779: initital ==> initial fs/smb/client/cifspdu.h:784: altetnative ==> alternative fs/smb/client/cifspdu.h:2409: conrol ==> control fs/smb/client/cifssmb.c:1218: Expirement ==> Experiment fs/smb/client/cifssmb.c:3021: conver ==> convert fs/smb/client/cifssmb.c:3998: asterik ==> asterisk fs/smb/client/file.c:2505: useable ==> usable fs/smb/client/fs_context.h:263: timemout ==> timeout fs/smb/client/misc.c:257: responsbility ==> responsibility fs/smb/client/netmisc.c:1006: divisable ==> divisible fs/smb/client/readdir.c:556: endianess ==> endianness fs/smb/client/readdir.c:818: bu ==> by fs/smb/client/smb2ops.c:2180: snaphots ==> snapshots fs/smb/client/smb2ops.c:3586: otions ==> options fs/smb/client/smb2pdu.c:2979: timestaps ==> timestamps fs/smb/client/smb2pdu.c:4574: memmory ==> memory fs/smb/client/smb2transport.c:699: origiginal ==> original fs/smb/client/smbdirect.c:222: happenes ==> happens fs/smb/client/smbdirect.c:1347: registartions ==> registrations fs/smb/client/smbdirect.h:114: accoutning ==> accounting Signed-off-by: Shen Lichuan <shenlichuan@vivo.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-10-02move asm/unaligned.h to linux/unaligned.hAl Viro
asm/unaligned.h is always an include of asm-generic/unaligned.h; might as well move that thing to linux/unaligned.h and include that - there's nothing arch-specific in that header. auto-generated by the following: for i in `git grep -l -w asm/unaligned.h`; do sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i done for i in `git grep -l -w asm-generic/unaligned.h`; do sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i done git mv include/asm-generic/unaligned.h include/linux/unaligned.h git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h
2024-10-02Merge tag 'zonefs-6.12-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs Pull zonefs update from Damien Le Moal: - Add support for the FS_IOC_GETFSSYSFSPATH ioctl * tag 'zonefs-6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs: zonefs: add support for FS_IOC_GETFSSYSFSPATH
2024-10-02netfs: Fix missing wakeup after issuing writesDavid Howells
After dividing up a proposed write into subrequests, netfslib sets NETFS_RREQ_ALL_QUEUED to indicate to the collector that it can move on to the final cleanup once it has emptied the subrequest queues. Now, whilst the collector will normally end up running at least once after this bit is set just because it takes a while to process all the write subrequests before the collector runs out of subrequests, there exists the possibility that the issuing thread will be forced to sleep and the collector thread will clean up all the subrequests before ALL_QUEUED gets set. In such a case, the collector thread will not get triggered again and will never clear NETFS_RREQ_IN_PROGRESS thus leaving a request uncompleted and causing a potential futute hang. Fix this by scheduling the write collector if all the subrequest queues are empty (and thus no writes pending issuance). Note that we'd do this ideally before queuing the subrequest, but in the case of buffered writeback, at least, we can't find out that we've run out of folios until after we've called writeback_iter() and it has returned NULL - at which point we might not actually have any subrequests still under construction. Fixes: 288ace2f57c9 ("netfs: New writeback implementation") Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/3317784.1727880350@warthog.procyon.org.uk cc: Jeff Layton <jlayton@kernel.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-02inotify: Fix possible deadlock in fsnotify_destroy_markLizhi Xu
[Syzbot reported] WARNING: possible circular locking dependency detected 6.11.0-rc4-syzkaller-00019-gb311c1b497e5 #0 Not tainted ------------------------------------------------------ kswapd0/78 is trying to acquire lock: ffff88801b8d8930 (&group->mark_mutex){+.+.}-{3:3}, at: fsnotify_group_lock include/linux/fsnotify_backend.h:270 [inline] ffff88801b8d8930 (&group->mark_mutex){+.+.}-{3:3}, at: fsnotify_destroy_mark+0x38/0x3c0 fs/notify/mark.c:578 but task is already holding lock: ffffffff8ea2fd60 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat mm/vmscan.c:6841 [inline] ffffffff8ea2fd60 (fs_reclaim){+.+.}-{0:0}, at: kswapd+0xbb4/0x35a0 mm/vmscan.c:7223 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (fs_reclaim){+.+.}-{0:0}: ... kmem_cache_alloc_noprof+0x3d/0x2a0 mm/slub.c:4044 inotify_new_watch fs/notify/inotify/inotify_user.c:599 [inline] inotify_update_watch fs/notify/inotify/inotify_user.c:647 [inline] __do_sys_inotify_add_watch fs/notify/inotify/inotify_user.c:786 [inline] __se_sys_inotify_add_watch+0x72e/0x1070 fs/notify/inotify/inotify_user.c:729 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f -> #0 (&group->mark_mutex){+.+.}-{3:3}: ... __mutex_lock+0x136/0xd70 kernel/locking/mutex.c:752 fsnotify_group_lock include/linux/fsnotify_backend.h:270 [inline] fsnotify_destroy_mark+0x38/0x3c0 fs/notify/mark.c:578 fsnotify_destroy_marks+0x14a/0x660 fs/notify/mark.c:934 fsnotify_inoderemove include/linux/fsnotify.h:264 [inline] dentry_unlink_inode+0x2e0/0x430 fs/dcache.c:403 __dentry_kill+0x20d/0x630 fs/dcache.c:610 shrink_kill+0xa9/0x2c0 fs/dcache.c:1055 shrink_dentry_list+0x2c0/0x5b0 fs/dcache.c:1082 prune_dcache_sb+0x10f/0x180 fs/dcache.c:1163 super_cache_scan+0x34f/0x4b0 fs/super.c:221 do_shrink_slab+0x701/0x1160 mm/shrinker.c:435 shrink_slab+0x1093/0x14d0 mm/shrinker.c:662 shrink_one+0x43b/0x850 mm/vmscan.c:4815 shrink_many mm/vmscan.c:4876 [inline] lru_gen_shrink_node mm/vmscan.c:4954 [inline] shrink_node+0x3799/0x3de0 mm/vmscan.c:5934 kswapd_shrink_node mm/vmscan.c:6762 [inline] balance_pgdat mm/vmscan.c:6954 [inline] kswapd+0x1bcd/0x35a0 mm/vmscan.c:7223 [Analysis] The problem is that inotify_new_watch() is using GFP_KERNEL to allocate new watches under group->mark_mutex, however if dentry reclaim races with unlinking of an inode, it can end up dropping the last dentry reference for an unlinked inode resulting in removal of fsnotify mark from reclaim context which wants to acquire group->mark_mutex as well. This scenario shows that all notification groups are in principle prone to this kind of a deadlock (previously, we considered only fanotify and dnotify to be problematic for other reasons) so make sure all allocations under group->mark_mutex happen with GFP_NOFS. Reported-and-tested-by: syzbot+c679f13773f295d2da53@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=c679f13773f295d2da53 Signed-off-by: Lizhi Xu <lizhi.xu@windriver.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240927143642.2369508-1-lizhi.xu@windriver.com
2024-10-02fsnotify: Avoid data race between fsnotify_recalc_mask() and ↵Jan Kara
fsnotify_object_watched() When __fsnotify_recalc_mask() recomputes the mask on the watched object, the compiler can "optimize" the code to perform partial updates to the mask (including zeroing it at the beginning). Thus places checking the object mask without conn->lock such as fsnotify_object_watched() could see invalid states of the mask. Make sure the mask update is performed by one memory store using WRITE_ONCE(). Reported-by: syzbot+701037856c25b143f1ad@syzkaller.appspotmail.com Reported-by: Dmitry Vyukov <dvyukov@google.com> Link: https://lore.kernel.org/all/CACT4Y+Zk0ohwwwHSD63U2-PQ=UuamXczr1mKBD6xtj2dyYKBvA@mail.gmail.com Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Link: https://patch.msgid.link/20240717140623.27768-1-jack@suse.cz
2024-10-02udf: fix uninit-value use in udf_get_fileshortadGianfranco Trad
Check for overflow when computing alen in udf_current_aext to mitigate later uninit-value use in udf_get_fileshortad KMSAN bug[1]. After applying the patch reproducer did not trigger any issue[2]. [1] https://syzkaller.appspot.com/bug?extid=8901c4560b7ab5c2f9df [2] https://syzkaller.appspot.com/x/log.txt?x=10242227980000 Reported-by: syzbot+8901c4560b7ab5c2f9df@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=8901c4560b7ab5c2f9df Tested-by: syzbot+8901c4560b7ab5c2f9df@syzkaller.appspotmail.com Suggested-by: Jan Kara <jack@suse.com> Signed-off-by: Gianfranco Trad <gianf.trad@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240925074613.8475-3-gianf.trad@gmail.com
2024-10-02udf: refactor inode_bmap() to handle errorZhao Mengmeng
Refactor inode_bmap() to handle error since udf_next_aext() can return error now. On situations like ftruncate, udf_extend_file() can now detect errors and bail out early without resorting to checking for particular offsets and assuming internal behavior of these functions. Reported-by: syzbot+7a4842f0b1801230a989@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=7a4842f0b1801230a989 Tested-by: syzbot+7a4842f0b1801230a989@syzkaller.appspotmail.com Signed-off-by: Zhao Mengmeng <zhaomengmeng@kylinos.cn> Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20241001115425.266556-4-zhaomzhao@126.com
2024-10-02udf: refactor udf_next_aext() to handle errorZhao Mengmeng
Since udf_current_aext() has error handling, udf_next_aext() should have error handling too. Besides, when too many indirect extents found in one inode, return -EFSCORRUPTED; when reading block failed, return -EIO. Signed-off-by: Zhao Mengmeng <zhaomengmeng@kylinos.cn> Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20241001115425.266556-3-zhaomzhao@126.com
2024-10-02udf: refactor udf_current_aext() to handle errorZhao Mengmeng
As Jan suggested in links below, refactor udf_current_aext() to differentiate between error, hit EOF and success, it now takes pointer to etype to store the extent type, return 1 when getting etype success, return 0 when hitting EOF and return -errno when err. Link: https://lore.kernel.org/all/20240912111235.6nr3wuqvktecy3vh@quack3/ Signed-off-by: Zhao Mengmeng <zhaomengmeng@kylinos.cn> Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20241001115425.266556-2-zhaomzhao@126.com
2024-10-02ufs_rename(): fix bogus argument of folio_release_kmap()Al Viro
new_dir does *NOT* point into dir_folio - it's an inode, not a pointer to ufs directory entry. Fixes: 516b97cf03dd6 "ufs: Convert directory handling to kmap_local" Acked-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-10-01smb: client: use actual path when queryfswangrong
Due to server permission control, the client does not have access to the shared root directory, but can access subdirectories normally, so users usually mount the shared subdirectories directly. In this case, queryfs should use the actual path instead of the root directory to avoid the call returning an error (EACCES). Signed-off-by: wangrong <wangrong@uniontech.com> Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2024-10-01bcachefs: Fix bad shift in bch2_read_flag_list()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-01ksmbd: Use struct_size() to improve smb_direct_rdma_xmit()Thorsten Blum
Use struct_size() to calculate the number of bytes to allocate for a new message. Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-10-01ksmbd: Annotate struct copychunk_ioctl_req with __counted_by_le()Thorsten Blum
Add the __counted_by_le compiler attribute to the flexible array member Chunks to improve access bounds-checking via CONFIG_UBSAN_BOUNDS and CONFIG_FORTIFY_SOURCE. Change the data type of the flexible array member Chunks from __u8[] to struct srv_copychunk[] for ChunkCount to match the number of elements in the Chunks array. (With __u8[], each srv_copychunk would occupy 24 array entries and the __counted_by compiler attribute wouldn't be applicable.) Use struct_size() to calculate the size of the copychunk_ioctl_req. Read Chunks[0] after checking that ChunkCount is not 0. Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-10-01ksmbd: Use struct_size() to improve get_file_alternate_info()Thorsten Blum
Use struct_size() to calculate the output buffer length. Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-10-01btrfs: disable rate limiting when debug enabledLeo Martins
Disable ratelimiting for btrfs_printk when CONFIG_BTRFS_DEBUG is enabled. This allows for more verbose output which is often needed by functions like btrfs_dump_space_info(). Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Leo Martins <loemra.dev@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-10-01btrfs: wait for fixup workers before stopping cleaner kthread during umountFilipe Manana
During unmount, at close_ctree(), we have the following steps in this order: 1) Park the cleaner kthread - this doesn't destroy the kthread, it basically halts its execution (wake ups against it work but do nothing); 2) We stop the cleaner kthread - this results in freeing the respective struct task_struct; 3) We call btrfs_stop_all_workers() which waits for any jobs running in all the work queues and then free the work queues. Syzbot reported a case where a fixup worker resulted in a crash when doing a delayed iput on its inode while attempting to wake up the cleaner at btrfs_add_delayed_iput(), because the task_struct of the cleaner kthread was already freed. This can happen during unmount because we don't wait for any fixup workers still running before we call kthread_stop() against the cleaner kthread, which stops and free all its resources. Fix this by waiting for any fixup workers at close_ctree() before we call kthread_stop() against the cleaner and run pending delayed iputs. The stack traces reported by syzbot were the following: BUG: KASAN: slab-use-after-free in __lock_acquire+0x77/0x2050 kernel/locking/lockdep.c:5065 Read of size 8 at addr ffff8880272a8a18 by task kworker/u8:3/52 CPU: 1 UID: 0 PID: 52 Comm: kworker/u8:3 Not tainted 6.12.0-rc1-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024 Workqueue: btrfs-fixup btrfs_work_helper Call Trace: <TASK> __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120 print_address_description mm/kasan/report.c:377 [inline] print_report+0x169/0x550 mm/kasan/report.c:488 kasan_report+0x143/0x180 mm/kasan/report.c:601 __lock_acquire+0x77/0x2050 kernel/locking/lockdep.c:5065 lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5825 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] _raw_spin_lock_irqsave+0xd5/0x120 kernel/locking/spinlock.c:162 class_raw_spinlock_irqsave_constructor include/linux/spinlock.h:551 [inline] try_to_wake_up+0xb0/0x1480 kernel/sched/core.c:4154 btrfs_writepage_fixup_worker+0xc16/0xdf0 fs/btrfs/inode.c:2842 btrfs_work_helper+0x390/0xc50 fs/btrfs/async-thread.c:314 process_one_work kernel/workqueue.c:3229 [inline] process_scheduled_works+0xa63/0x1850 kernel/workqueue.c:3310 worker_thread+0x870/0xd30 kernel/workqueue.c:3391 kthread+0x2f0/0x390 kernel/kthread.c:389 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 </TASK> Allocated by task 2: kasan_save_stack mm/kasan/common.c:47 [inline] kasan_save_track+0x3f/0x80 mm/kasan/common.c:68 unpoison_slab_object mm/kasan/common.c:319 [inline] __kasan_slab_alloc+0x66/0x80 mm/kasan/common.c:345 kasan_slab_alloc include/linux/kasan.h:247 [inline] slab_post_alloc_hook mm/slub.c:4086 [inline] slab_alloc_node mm/slub.c:4135 [inline] kmem_cache_alloc_node_noprof+0x16b/0x320 mm/slub.c:4187 alloc_task_struct_node kernel/fork.c:180 [inline] dup_task_struct+0x57/0x8c0 kernel/fork.c:1107 copy_process+0x5d1/0x3d50 kernel/fork.c:2206 kernel_clone+0x223/0x880 kernel/fork.c:2787 kernel_thread+0x1bc/0x240 kernel/fork.c:2849 create_kthread kernel/kthread.c:412 [inline] kthreadd+0x60d/0x810 kernel/kthread.c:765 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 Freed by task 61: kasan_save_stack mm/kasan/common.c:47 [inline] kasan_save_track+0x3f/0x80 mm/kasan/common.c:68 kasan_save_free_info+0x40/0x50 mm/kasan/generic.c:579 poison_slab_object mm/kasan/common.c:247 [inline] __kasan_slab_free+0x59/0x70 mm/kasan/common.c:264 kasan_slab_free include/linux/kasan.h:230 [inline] slab_free_hook mm/slub.c:2343 [inline] slab_free mm/slub.c:4580 [inline] kmem_cache_free+0x1a2/0x420 mm/slub.c:4682 put_task_struct include/linux/sched/task.h:144 [inline] delayed_put_task_struct+0x125/0x300 kernel/exit.c:228 rcu_do_batch kernel/rcu/tree.c:2567 [inline] rcu_core+0xaaa/0x17a0 kernel/rcu/tree.c:2823 handle_softirqs+0x2c5/0x980 kernel/softirq.c:554 __do_softirq kernel/softirq.c:588 [inline] invoke_softirq kernel/softirq.c:428 [inline] __irq_exit_rcu+0xf4/0x1c0 kernel/softirq.c:637 irq_exit_rcu+0x9/0x30 kernel/softirq.c:649 instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1037 [inline] sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1037 asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702 Last potentially related work creation: kasan_save_stack+0x3f/0x60 mm/kasan/common.c:47 __kasan_record_aux_stack+0xac/0xc0 mm/kasan/generic.c:541 __call_rcu_common kernel/rcu/tree.c:3086 [inline] call_rcu+0x167/0xa70 kernel/rcu/tree.c:3190 context_switch kernel/sched/core.c:5318 [inline] __schedule+0x184b/0x4ae0 kernel/sched/core.c:6675 schedule_idle+0x56/0x90 kernel/sched/core.c:6793 do_idle+0x56a/0x5d0 kernel/sched/idle.c:354 cpu_startup_entry+0x42/0x60 kernel/sched/idle.c:424 start_secondary+0x102/0x110 arch/x86/kernel/smpboot.c:314 common_startup_64+0x13e/0x147 The buggy address belongs to the object at ffff8880272a8000 which belongs to the cache task_struct of size 7424 The buggy address is located 2584 bytes inside of freed 7424-byte region [ffff8880272a8000, ffff8880272a9d00) The buggy address belongs to the physical page: page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x272a8 head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0 flags: 0xfff00000000040(head|node=0|zone=1|lastcpupid=0x7ff) page_type: f5(slab) raw: 00fff00000000040 ffff88801bafa500 dead000000000122 0000000000000000 raw: 0000000000000000 0000000080040004 00000001f5000000 0000000000000000 head: 00fff00000000040 ffff88801bafa500 dead000000000122 0000000000000000 head: 0000000000000000 0000000080040004 00000001f5000000 0000000000000000 head: 00fff00000000003 ffffea00009caa01 ffffffffffffffff 0000000000000000 head: 0000000000000008 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: kasan: bad access detected page_owner tracks the page as allocated page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 2, tgid 2 (kthreadd), ts 71247381401, free_ts 71214998153 set_page_owner include/linux/page_owner.h:32 [inline] post_alloc_hook+0x1f3/0x230 mm/page_alloc.c:1537 prep_new_page mm/page_alloc.c:1545 [inline] get_page_from_freelist+0x3039/0x3180 mm/page_alloc.c:3457 __alloc_pages_noprof+0x256/0x6c0 mm/page_alloc.c:4733 alloc_pages_mpol_noprof+0x3e8/0x680 mm/mempolicy.c:2265 alloc_slab_page+0x6a/0x120 mm/slub.c:2413 allocate_slab+0x5a/0x2f0 mm/slub.c:2579 new_slab mm/slub.c:2632 [inline] ___slab_alloc+0xcd1/0x14b0 mm/slub.c:3819 __slab_alloc+0x58/0xa0 mm/slub.c:3909 __slab_alloc_node mm/slub.c:3962 [inline] slab_alloc_node mm/slub.c:4123 [inline] kmem_cache_alloc_node_noprof+0x1fe/0x320 mm/slub.c:4187 alloc_task_struct_node kernel/fork.c:180 [inline] dup_task_struct+0x57/0x8c0 kernel/fork.c:1107 copy_process+0x5d1/0x3d50 kernel/fork.c:2206 kernel_clone+0x223/0x880 kernel/fork.c:2787 kernel_thread+0x1bc/0x240 kernel/fork.c:2849 create_kthread kernel/kthread.c:412 [inline] kthreadd+0x60d/0x810 kernel/kthread.c:765 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 page last free pid 5230 tgid 5230 stack trace: reset_page_owner include/linux/page_owner.h:25 [inline] free_pages_prepare mm/page_alloc.c:1108 [inline] free_unref_page+0xcd0/0xf00 mm/page_alloc.c:2638 discard_slab mm/slub.c:2678 [inline] __put_partials+0xeb/0x130 mm/slub.c:3146 put_cpu_partial+0x17c/0x250 mm/slub.c:3221 __slab_free+0x2ea/0x3d0 mm/slub.c:4450 qlink_free mm/kasan/quarantine.c:163 [inline] qlist_free_all+0x9a/0x140 mm/kasan/quarantine.c:179 kasan_quarantine_reduce+0x14f/0x170 mm/kasan/quarantine.c:286 __kasan_slab_alloc+0x23/0x80 mm/kasan/common.c:329 kasan_slab_alloc include/linux/kasan.h:247 [inline] slab_post_alloc_hook mm/slub.c:4086 [inline] slab_alloc_node mm/slub.c:4135 [inline] kmem_cache_alloc_noprof+0x135/0x2a0 mm/slub.c:4142 getname_flags+0xb7/0x540 fs/namei.c:139 do_sys_openat2+0xd2/0x1d0 fs/open.c:1409 do_sys_open fs/open.c:1430 [inline] __do_sys_openat fs/open.c:1446 [inline] __se_sys_openat fs/open.c:1441 [inline] __x64_sys_openat+0x247/0x2a0 fs/open.c:1441 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Memory state around the buggy address: ffff8880272a8900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8880272a8980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff8880272a8a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff8880272a8a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8880272a8b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== Reported-by: syzbot+8aaf2df2ef0164ffe1fb@syzkaller.appspotmail.com Link: https://lore.kernel.org/linux-btrfs/66fb36b1.050a0220.aab67.003b.GAE@google.com/ CC: stable@vger.kernel.org # 4.19+ Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-10-01btrfs: fix a NULL pointer dereference when failed to start a new trasacntionQu Wenruo
[BUG] Syzbot reported a NULL pointer dereference with the following crash: FAULT_INJECTION: forcing a failure. start_transaction+0x830/0x1670 fs/btrfs/transaction.c:676 prepare_to_relocate+0x31f/0x4c0 fs/btrfs/relocation.c:3642 relocate_block_group+0x169/0xd20 fs/btrfs/relocation.c:3678 ... BTRFS info (device loop0): balance: ended with status: -12 Oops: general protection fault, probably for non-canonical address 0xdffffc00000000cc: 0000 [#1] PREEMPT SMP KASAN NOPTI KASAN: null-ptr-deref in range [0x0000000000000660-0x0000000000000667] RIP: 0010:btrfs_update_reloc_root+0x362/0xa80 fs/btrfs/relocation.c:926 Call Trace: <TASK> commit_fs_roots+0x2ee/0x720 fs/btrfs/transaction.c:1496 btrfs_commit_transaction+0xfaf/0x3740 fs/btrfs/transaction.c:2430 del_balance_item fs/btrfs/volumes.c:3678 [inline] reset_balance_state+0x25e/0x3c0 fs/btrfs/volumes.c:3742 btrfs_balance+0xead/0x10c0 fs/btrfs/volumes.c:4574 btrfs_ioctl_balance+0x493/0x7c0 fs/btrfs/ioctl.c:3673 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:907 [inline] __se_sys_ioctl+0xf9/0x170 fs/ioctl.c:893 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f [CAUSE] The allocation failure happens at the start_transaction() inside prepare_to_relocate(), and during the error handling we call unset_reloc_control(), which makes fs_info->balance_ctl to be NULL. Then we continue the error path cleanup in btrfs_balance() by calling reset_balance_state() which will call del_balance_item() to fully delete the balance item in the root tree. However during the small window between set_reloc_contrl() and unset_reloc_control(), we can have a subvolume tree update and created a reloc_root for that subvolume. Then we go into the final btrfs_commit_transaction() of del_balance_item(), and into btrfs_update_reloc_root() inside commit_fs_roots(). That function checks if fs_info->reloc_ctl is in the merge_reloc_tree stage, but since fs_info->reloc_ctl is NULL, it results a NULL pointer dereference. [FIX] Just add extra check on fs_info->reloc_ctl inside btrfs_update_reloc_root(), before checking fs_info->reloc_ctl->merge_reloc_tree. That DEAD_RELOC_TREE handling is to prevent further modification to the reloc tree during merge stage, but since there is no reloc_ctl at all, we do not need to bother that. Reported-by: syzbot+283673dbc38527ef9f3d@syzkaller.appspotmail.com Link: https://lore.kernel.org/linux-btrfs/66f6bfa7.050a0220.38ace9.0019.GAE@google.com/ CC: stable@vger.kernel.org # 4.19+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-10-01btrfs: send: fix invalid clone operation for file that got its size decreasedFilipe Manana
During an incremental send we may end up sending an invalid clone operation, for the last extent of a file which ends at an unaligned offset that matches the final i_size of the file in the send snapshot, in case the file had its initial size (the size in the parent snapshot) decreased in the send snapshot. In this case the destination will fail to apply the clone operation because its end offset is not sector size aligned and it ends before the current size of the file. Sending the truncate operation always happens when we finish processing an inode, after we process all its extents (and xattrs, names, etc). So fix this by ensuring the file has a valid size before we send a clone operation for an unaligned extent that ends at the final i_size of the file. The size we truncate to matches the start offset of the clone range but it could be any value between that start offset and the final size of the file since the clone operation will expand the i_size if the current size is smaller than the end offset. The start offset of the range was chosen because it's always sector size aligned and avoids a truncation into the middle of a page, which results in dirtying the page due to filling part of it with zeroes and then making the clone operation at the receiver trigger IO. The following test reproduces the issue: $ cat test.sh #!/bin/bash DEV=/dev/sdi MNT=/mnt/sdi mkfs.btrfs -f $DEV mount $DEV $MNT # Create a file with a size of 256K + 5 bytes, having two extents, one # with a size of 128K and another one with a size of 128K + 5 bytes. last_ext_size=$((128 * 1024 + 5)) xfs_io -f -d -c "pwrite -S 0xab -b 128K 0 128K" \ -c "pwrite -S 0xcd -b $last_ext_size 128K $last_ext_size" \ $MNT/foo # Another file which we will later clone foo into, but initially with # a larger size than foo. xfs_io -f -c "pwrite -S 0xef 0 1M" $MNT/bar btrfs subvolume snapshot -r $MNT/ $MNT/snap1 # Now resize bar and clone foo into it. xfs_io -c "truncate 0" \ -c "reflink $MNT/foo" $MNT/bar btrfs subvolume snapshot -r $MNT/ $MNT/snap2 rm -f /tmp/send-full /tmp/send-inc btrfs send -f /tmp/send-full $MNT/snap1 btrfs send -p $MNT/snap1 -f /tmp/send-inc $MNT/snap2 umount $MNT mkfs.btrfs -f $DEV mount $DEV $MNT btrfs receive -f /tmp/send-full $MNT btrfs receive -f /tmp/send-inc $MNT umount $MNT Running it before this patch: $ ./test.sh (...) At subvol snap1 At snapshot snap2 ERROR: failed to clone extents to bar: Invalid argument A test case for fstests will be sent soon. Reported-by: Ben Millwood <thebenmachine@gmail.com> Link: https://lore.kernel.org/linux-btrfs/CAJhrHS2z+WViO2h=ojYvBPDLsATwLbg+7JaNCyYomv0fUxEpQQ@mail.gmail.com/ Fixes: 46a6e10a1ab1 ("btrfs: send: allow cloning non-aligned extent if it ends at i_size") CC: stable@vger.kernel.org # 6.11 Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-10-01btrfs: drop the backref cache during relocation if we commitJosef Bacik
Since the inception of relocation we have maintained the backref cache across transaction commits, updating the backref cache with the new bytenr whenever we COWed blocks that were in the cache, and then updating their bytenr once we detected a transaction id change. This works as long as we're only ever modifying blocks, not changing the structure of the tree. However relocation does in fact change the structure of the tree. For example, if we are relocating a data extent, we will look up all the leaves that point to this data extent. We will then call do_relocation() on each of these leaves, which will COW down to the leaf and then update the file extent location. But, a key feature of do_relocation() is the pending list. This is all the pending nodes that we modified when we updated the file extent item. We will then process all of these blocks via finish_pending_nodes, which calls do_relocation() on all of the nodes that led up to that leaf. The purpose of this is to make sure we don't break sharing unless we absolutely have to. Consider the case that we have 3 snapshots that all point to this leaf through the same nodes, the initial COW would have created a whole new path. If we did this for all 3 snapshots we would end up with 3x the number of nodes we had originally. To avoid this we will cycle through each of the snapshots that point to each of these nodes and update their pointers to point at the new nodes. Once we update the pointer to the new node we will drop the node we removed the link for and all of its children via btrfs_drop_subtree(). This is essentially just btrfs_drop_snapshot(), but for an arbitrary point in the snapshot. The problem with this is that we will never reflect this in the backref cache. If we do this btrfs_drop_snapshot() for a node that is in the backref tree, we will leave the node in the backref tree. This becomes a problem when we change the transid, as now the backref cache has entire subtrees that no longer exist, but exist as if they still are pointed to by the same roots. In the best case scenario you end up with "adding refs to an existing tree ref" errors from insert_inline_extent_backref(), where we attempt to link in nodes on roots that are no longer valid. Worst case you will double free some random block and re-use it when there's still references to the block. This is extremely subtle, and the consequences are quite bad. There isn't a way to make sure our backref cache is consistent between transid's. In order to fix this we need to simply evict the entire backref cache anytime we cross transid's. This reduces performance in that we have to rebuild this backref cache every time we change transid's, but fixes the bug. This has existed since relocation was added, and is a pretty critical bug. There's a lot more cleanup that can be done now that this functionality is going away, but this patch is as small as possible in order to fix the problem and make it easy for us to backport it to all the kernels it needs to be backported to. Followup series will dismantle more of this code and simplify relocation drastically to remove this functionality. We have a reproducer that reproduced the corruption within a few minutes of running. With this patch it survives several iterations/hours of running the reproducer. Fixes: 3fd0a5585eb9 ("Btrfs: Metadata ENOSPC handling for balance") CC: stable@vger.kernel.org Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-10-01btrfs: also add stripe entries for NOCOW writesJohannes Thumshirn
NOCOW writes do not generate stripe_extent entries in the RAID stripe tree, as the RAID stripe-tree feature initially was designed with a zoned filesystem in mind and on a zoned filesystem, we do not allow NOCOW writes. But the RAID stripe-tree feature is independent from the zoned feature, so we must also do NOCOW writes for RAID stripe-tree filesystems. Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>