summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2024-08-03Merge tag 'xfs-6.11-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs fixes from Chandan Babu: - Fix memory leak when corruption is detected during scrubbing parent pointers - Allow SECURE namespace xattrs to use reserved block pool to in order to prevent ENOSPC - Save stack space by passing tracepoint's char array to file_path() instead of another stack variable - Remove unused parameter in macro XFS_DQUOT_LOGRES - Replace comma with semicolon in a couple of places * tag 'xfs-6.11-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: convert comma to semicolon xfs: convert comma to semicolon xfs: remove unused parameter in macro XFS_DQUOT_LOGRES xfs: fix file_path handling in tracepoints xfs: allow SECURE namespace xattrs to use reserved block pool xfs: fix a memory leak
2024-08-02btrfs: avoid using fixed char array size for tree namesQu Wenruo
[BUG] There is a bug report that using the latest trunk GCC 15, btrfs would cause unterminated-string-initialization warning: linux-6.6/fs/btrfs/print-tree.c:29:49: error: initializer-string for array of ‘char’ is too long [-Werror=unterminated-string-initialization] 29 | { BTRFS_BLOCK_GROUP_TREE_OBJECTID, "BLOCK_GROUP_TREE" }, | ^~~~~~~~~~~~~~~~~~ [CAUSE] To print tree names we have an array of root_name_map structure, which uses "char name[16];" to store the name string of a tree. But the following trees have names exactly at 16 chars length: - "BLOCK_GROUP_TREE" - "RAID_STRIPE_TREE" This means we will have no space for the terminating '\0', and can lead to unexpected access when printing the name. [FIX] Instead of "char name[16];" use "const char *" instead. Since the name strings are all read-only data, and are all NULL terminated by default, there is not much need to bother the length at all. Reported-by: Sam James <sam@gentoo.org> Reported-by: Alejandro Colomar <alx@kernel.org> Fixes: edde81f1abf29 ("btrfs: add raid stripe tree pretty printer") Fixes: 9c54e80ddc6bd ("btrfs: add code to support the block group root") CC: stable@vger.kernel.org # 6.1+ Suggested-by: Alejandro Colomar <alx@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Alejandro Colomar <alx@kernel.org> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-08-02btrfs: fix double inode unlock for direct IO sync writesFilipe Manana
If we do a direct IO sync write, at btrfs_sync_file(), and we need to skip inode logging or we get an error starting a transaction or an error when flushing delalloc, we end up unlocking the inode when we shouldn't under the 'out_release_extents' label, and then unlock it again at btrfs_direct_write(). Fix that by checking if we have to skip inode unlocking under that label. Reported-by: syzbot+7dbbb74af6291b5a5a8b@syzkaller.appspotmail.com Link: https://lore.kernel.org/linux-btrfs/000000000000dfd631061eaeb4bc@google.com/ Fixes: 939b656bc8ab ("btrfs: fix corruption after buffer fault in during direct IO append write") Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-08-02Merge tag 'ceph-for-6.11-rc2' of https://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph fix from Ilya Dryomov: "A fix for a potential hang in the MDS when cap revocation races with the client releasing the caps in question, marked for stable" * tag 'ceph-for-6.11-rc2' of https://github.com/ceph/ceph-client: ceph: force sending a cap update msg back to MDS for revoke op
2024-08-02cifs: update internal version numberSteve French
To 2.50 Signed-off-by: Steve French <stfrench@microsoft.com>
2024-08-02smb: client: fix FSCTL_GET_REPARSE_POINT against NetAppPaulo Alcantara
NetApp server requires the file to be open with FILE_READ_EA access in order to support FSCTL_GET_REPARSE_POINT, otherwise it will return STATUS_INVALID_DEVICE_REQUEST. It doesn't make any sense because there's no requirement for FILE_READ_EA bit to be set nor STATUS_INVALID_DEVICE_REQUEST being used for something other than "unsupported reparse points" in MS-FSA. To fix it and improve compatibility, set FILE_READ_EA & SYNCHRONIZE bits to match what Windows client currently does. Tested-by: Sebastian Steinbeisser <Sebastian.Steinbeisser@lrz.de> Acked-by: Tom Talpey <tom@talpey.com> Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-08-02smb3: add dynamic tracepoints for shutdown ioctlSteve French
For debugging an umount failure in xfstests generic/043 generic/044 in some configurations, we needed more information on the shutdown ioctl which was suspected of being related to the cause, so tracepoints are added in this patch e.g. "trace-cmd record -e smb3_shutdown_enter -e smb3_shutdown_done -e smb3_shutdown_err" Sample output: godown-47084 [011] ..... 3313.756965: smb3_shutdown_enter: flags=0x1 tid=0x733b3e75 godown-47084 [011] ..... 3313.756968: smb3_shutdown_done: flags=0x1 tid=0x733b3e75 Tested-by: Anthony Nandaa (Microsoft) <profnandaa@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-08-02cifs: Remove cifs_aio_ctxDavid Howells
Remove struct cifs_aio_ctx and its associated alloc/release functions as it is no longer used, the functions being taken over by netfslib. Signed-off-by: David Howells <dhowells@redhat.com> cc: Steve French <sfrench@samba.org> cc: linux-cifs@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2024-08-02smb: client: handle lack of FSCTL_GET_REPARSE_POINT supportPaulo Alcantara
As per MS-FSA 2.1.5.10.14, support for FSCTL_GET_REPARSE_POINT is optional and if the server doesn't support it, STATUS_INVALID_DEVICE_REQUEST must be returned for the operation. If we find files with reparse points and we can't read them due to lack of client or server support, just ignore it and then treat them as regular files or junctions. Fixes: 5f71ebc41294 ("smb: client: parse reparse point flag in create response") Reported-by: Sebastian Steinbeisser <Sebastian.Steinbeisser@lrz.de> Tested-by: Sebastian Steinbeisser <Sebastian.Steinbeisser@lrz.de> Acked-by: Tom Talpey <tom@talpey.com> Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-08-02Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds
Pull vfs fix from Al Viro: "do_dup2() out-of-bounds array speculation fix" * tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: protect the fetch of ->fd[fd] in do_dup2() from mispredictions
2024-08-01protect the fetch of ->fd[fd] in do_dup2() from mispredictionsAl Viro
both callers have verified that fd is not greater than ->max_fds; however, misprediction might end up with tofree = fdt->fd[fd]; being speculatively executed. That's wrong for the same reasons why it's wrong in close_fd()/file_close_fd_locked(); the same solution applies - array_index_nospec(fd, fdt->max_fds) could differ from fd only in case of speculative execution on mispredicted path. Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-08-01btrfs: emit a warning about space cache v1 being deprecatedJosef Bacik
We've been wanting to get rid of this for a while, add a message to indicate that this feature is going away and when so we can finally have a date when we're going to remove it. The output looks like this BTRFS warning (device nvme0n1): space cache v1 is being deprecated and will be removed in a future release, please use -o space_cache=v2 Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Neal Gompa <neal@gompa.dev> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-08-01btrfs: fix qgroup reserve leaks in cow_file_rangeBoris Burkov
In the buffered write path, the dirty page owns the qgroup reserve until it creates an ordered_extent. Therefore, any errors that occur before the ordered_extent is created must free that reservation, or else the space is leaked. The fstest generic/475 exercises various IO error paths, and is able to trigger errors in cow_file_range where we fail to get to allocating the ordered extent. Note that because we *do* clear delalloc, we are likely to remove the inode from the delalloc list, so the inodes/pages to not have invalidate/launder called on them in the commit abort path. This results in failures at the unmount stage of the test that look like: BTRFS: error (device dm-8 state EA) in cleanup_transaction:2018: errno=-5 IO failure BTRFS: error (device dm-8 state EA) in btrfs_replace_file_extents:2416: errno=-5 IO failure BTRFS warning (device dm-8 state EA): qgroup 0/5 has unreleased space, type 0 rsv 28672 ------------[ cut here ]------------ WARNING: CPU: 3 PID: 22588 at fs/btrfs/disk-io.c:4333 close_ctree+0x222/0x4d0 [btrfs] Modules linked in: btrfs blake2b_generic libcrc32c xor zstd_compress raid6_pq CPU: 3 PID: 22588 Comm: umount Kdump: loaded Tainted: G W 6.10.0-rc7-gab56fde445b8 #21 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 RIP: 0010:close_ctree+0x222/0x4d0 [btrfs] RSP: 0018:ffffb4465283be00 EFLAGS: 00010202 RAX: 0000000000000001 RBX: ffffa1a1818e1000 RCX: 0000000000000001 RDX: 0000000000000000 RSI: ffffb4465283bbe0 RDI: ffffa1a19374fcb8 RBP: ffffa1a1818e13c0 R08: 0000000100028b16 R09: 0000000000000000 R10: 0000000000000003 R11: 0000000000000003 R12: ffffa1a18ad7972c R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 FS: 00007f9168312b80(0000) GS:ffffa1a4afcc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f91683c9140 CR3: 000000010acaa000 CR4: 00000000000006f0 Call Trace: <TASK> ? close_ctree+0x222/0x4d0 [btrfs] ? __warn.cold+0x8e/0xea ? close_ctree+0x222/0x4d0 [btrfs] ? report_bug+0xff/0x140 ? handle_bug+0x3b/0x70 ? exc_invalid_op+0x17/0x70 ? asm_exc_invalid_op+0x1a/0x20 ? close_ctree+0x222/0x4d0 [btrfs] generic_shutdown_super+0x70/0x160 kill_anon_super+0x11/0x40 btrfs_kill_super+0x11/0x20 [btrfs] deactivate_locked_super+0x2e/0xa0 cleanup_mnt+0xb5/0x150 task_work_run+0x57/0x80 syscall_exit_to_user_mode+0x121/0x130 do_syscall_64+0xab/0x1a0 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f916847a887 ---[ end trace 0000000000000000 ]--- BTRFS error (device dm-8 state EA): qgroup reserved space leaked Cases 2 and 3 in the out_reserve path both pertain to this type of leak and must free the reserved qgroup data. Because it is already an error path, I opted not to handle the possible errors in btrfs_free_qgroup_data. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>
2024-08-01btrfs: implement launder_folio for clearing dirty page reserveBoris Burkov
In the buffered write path, dirty pages can be said to "own" the qgroup reservation until they create an ordered_extent. It is possible for there to be outstanding dirty pages when a transaction is aborted, in which case there is no cancellation path for freeing this reservation and it is leaked. We do already walk the list of outstanding delalloc inodes in btrfs_destroy_delalloc_inodes() and call invalidate_inode_pages2() on them. This does *not* call btrfs_invalidate_folio(), as one might guess, but rather calls launder_folio() and release_folio(). Since this is a reservation associated with dirty pages only, rather than something associated with the private bit (ordered_extent is cancelled separately already in the cleanup transaction path), implementing this release should be done via launder_folio. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>
2024-08-01btrfs: scrub: update last_physical after scrubbing one stripeQu Wenruo
Currently sctx->stat.last_physical only got updated in the following cases: - When the last stripe of a non-RAID56 chunk is scrubbed This implies a pitfall, if the last stripe is at the chunk boundary, and we finished the scrub of the whole chunk, we won't update last_physical at all until the next chunk. - When a P/Q stripe of a RAID56 chunk is scrubbed This leads the following two problems: - sctx->stat.last_physical is not updated for a almost full chunk This is especially bad, affecting scrub resume, as the resume would start from last_physical, causing unnecessary re-scrub. - "btrfs scrub status" will not report any progress for a long time Fix the problem by properly updating @last_physical after each stripe is scrubbed. And since we're here, for the sake of consistency, use spin lock to protect the update of @last_physical, just like all the remaining call sites touching sctx->stat. Reported-by: Michel Palleau <michel.palleau@gmail.com> Link: https://lore.kernel.org/linux-btrfs/CAMFk-+igFTv2E8svg=cQ6o3e6CrR5QwgQ3Ok9EyRaEvvthpqCQ@mail.gmail.com/ Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-08-01btrfs: factor out stripe length calculation into a helperQu Wenruo
Currently there are two locations which need to calculate the real length of a stripe (which can be at the end of a chunk, and the chunk size may not always be 64K aligned). Factor them into a helper as we're going to have a third user soon. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-08-01ceph: force sending a cap update msg back to MDS for revoke opXiubo Li
If a client sends out a cap update dropping caps with the prior 'seq' just before an incoming cap revoke request, then the client may drop the revoke because it believes it's already released the requested capabilities. This causes the MDS to wait indefinitely for the client to respond to the revoke. It's therefore always a good idea to ack the cap revoke request with the bumped up 'seq'. Currently if the cap->issued equals to the newcaps the check_caps() will do nothing, we should force flush the caps. Cc: stable@vger.kernel.org Link: https://tracker.ceph.com/issues/61782 Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Venky Shankar <vshankar@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2024-07-31lsm: Refactor return value of LSM hook inode_copy_up_xattrXu Kuohai
To be consistent with most LSM hooks, convert the return value of hook inode_copy_up_xattr to 0 or a negative error code. Before: - Hook inode_copy_up_xattr returns 0 when accepting xattr, 1 when discarding xattr, -EOPNOTSUPP if it does not know xattr, or any other negative error code otherwise. After: - Hook inode_copy_up_xattr returns 0 when accepting xattr, *-ECANCELED* when discarding xattr, -EOPNOTSUPP if it does not know xattr, or any other negative error code otherwise. Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Reviewed-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2024-07-30Merge tag 'for-6.11-rc1-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - fix regression in extent map rework when handling insertion of overlapping compressed extent - fix unexpected file length when appending to a file using direct io and buffer not faulted in - in zoned mode, fix accounting of unusable space when flipping read-only block group back to read-write - fix page locking when COWing an inline range, assertion failure found by syzbot - fix calculation of space info in debugging print - tree-checker, add validation of data reference item - fix a few -Wmaybe-uninitialized build warnings * tag 'for-6.11-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: initialize location to fix -Wmaybe-uninitialized in btrfs_lookup_dentry() btrfs: fix corruption after buffer fault in during direct IO append write btrfs: zoned: fix zone_unusable accounting on making block group read-write again btrfs: do not subtract delalloc from avail bytes btrfs: make cow_file_range_inline() honor locked_page on error btrfs: fix corrupt read due to bad offset of a compressed extent map btrfs: tree-checker: validate dref root and objectid
2024-07-30bcachefs: Fix double free of ca->buckets_nouseKent Overstreet
Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Fixes: ffcbec6076 ("bcachefs: Kill opts.buckets_nouse") Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-30btrfs: initialize location to fix -Wmaybe-uninitialized in btrfs_lookup_dentry()David Sterba
Some arch + compiler combinations report a potentially unused variable location in btrfs_lookup_dentry(). This is a false alert as the variable is passed by value and always valid or there's an error. The compilers cannot probably reason about that although btrfs_inode_by_name() is in the same file. > + /kisskb/src/fs/btrfs/inode.c: error: 'location.objectid' may be used +uninitialized in this function [-Werror=maybe-uninitialized]: => 5603:9 > + /kisskb/src/fs/btrfs/inode.c: error: 'location.type' may be used +uninitialized in this function [-Werror=maybe-uninitialized]: => 5674:5 m68k-gcc8/m68k-allmodconfig mips-gcc8/mips-allmodconfig powerpc-gcc5/powerpc-all{mod,yes}config powerpc-gcc5/ppc64_defconfig Initialize it to zero, this should fix the warnings and won't change the behaviour as btrfs_inode_by_name() accepts only a root or inode item types, otherwise returns an error. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Link: https://lore.kernel.org/linux-btrfs/bd4e9928-17b3-9257-8ba7-6b7f9bbb639a@linux-m68k.org/ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-07-29signal: Remove task argument from dequeue_signal()Thomas Gleixner
The task pointer which is handed to dequeue_signal() is always current. The argument along with the first comment about signalfd in that function is confusing at best. Remove it and use current internally. Update the stale comment for dequeue_signal() while at it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2024-07-29posix-timers: Convert timer list to hlistThomas Gleixner
No requirement for a real list. Spare a few bytes. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2024-07-29btrfs: fix corruption after buffer fault in during direct IO append writeFilipe Manana
During an append (O_APPEND write flag) direct IO write if the input buffer was not previously faulted in, we can corrupt the file in a way that the final size is unexpected and it includes an unexpected hole. The problem happens like this: 1) We have an empty file, with size 0, for example; 2) We do an O_APPEND direct IO with a length of 4096 bytes and the input buffer is not currently faulted in; 3) We enter btrfs_direct_write(), lock the inode and call generic_write_checks(), which calls generic_write_checks_count(), and that function sets the iocb position to 0 with the following code: if (iocb->ki_flags & IOCB_APPEND) iocb->ki_pos = i_size_read(inode); 4) We call btrfs_dio_write() and enter into iomap, which will end up calling btrfs_dio_iomap_begin() and that calls btrfs_get_blocks_direct_write(), where we update the i_size of the inode to 4096 bytes; 5) After btrfs_dio_iomap_begin() returns, iomap will attempt to access the page of the write input buffer (at iomap_dio_bio_iter(), with a call to bio_iov_iter_get_pages()) and fail with -EFAULT, which gets returned to btrfs at btrfs_direct_write() via btrfs_dio_write(); 6) At btrfs_direct_write() we get the -EFAULT error, unlock the inode, fault in the write buffer and then goto to the label 'relock'; 7) We lock again the inode, do all the necessary checks again and call again generic_write_checks(), which calls generic_write_checks_count() again, and there we set the iocb's position to 4K, which is the current i_size of the inode, with the following code pointed above: if (iocb->ki_flags & IOCB_APPEND) iocb->ki_pos = i_size_read(inode); 8) Then we go again to btrfs_dio_write() and enter iomap and the write succeeds, but it wrote to the file range [4K, 8K), leaving a hole in the [0, 4K) range and an i_size of 8K, which goes against the expectations of having the data written to the range [0, 4K) and get an i_size of 4K. Fix this by not unlocking the inode before faulting in the input buffer, in case we get -EFAULT or an incomplete write, and not jumping to the 'relock' label after faulting in the buffer - instead jump to a location immediately before calling iomap, skipping all the write checks and relocking. This solves this problem and it's fine even in case the input buffer is memory mapped to the same file range, since only holding the range locked in the inode's io tree can cause a deadlock, it's safe to keep the inode lock (VFS lock), as was fixed and described in commit 51bd9563b678 ("btrfs: fix deadlock due to page faults during direct IO reads and writes"). A sample reproducer provided by a reporter is the following: $ cat test.c #ifndef _GNU_SOURCE #define _GNU_SOURCE #endif #include <fcntl.h> #include <stdio.h> #include <sys/mman.h> #include <sys/stat.h> #include <unistd.h> int main(int argc, char *argv[]) { if (argc < 2) { fprintf(stderr, "Usage: %s <test file>\n", argv[0]); return 1; } int fd = open(argv[1], O_WRONLY | O_CREAT | O_TRUNC | O_DIRECT | O_APPEND, 0644); if (fd < 0) { perror("creating test file"); return 1; } char *buf = mmap(NULL, 4096, PROT_READ, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); ssize_t ret = write(fd, buf, 4096); if (ret < 0) { perror("pwritev2"); return 1; } struct stat stbuf; ret = fstat(fd, &stbuf); if (ret < 0) { perror("stat"); return 1; } printf("size: %llu\n", (unsigned long long)stbuf.st_size); return stbuf.st_size == 4096 ? 0 : 1; } A test case for fstests will be sent soon. Reported-by: Hanna Czenczek <hreitz@redhat.com> Link: https://lore.kernel.org/linux-btrfs/0b841d46-12fe-4e64-9abb-871d8d0de271@redhat.com/ Fixes: 8184620ae212 ("btrfs: fix lost file sync on direct IO write with nowait and dsync iocb") CC: stable@vger.kernel.org # 6.1+ Tested-by: Hanna Czenczek <hreitz@redhat.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-07-29btrfs: zoned: fix zone_unusable accounting on making block group read-write ↵Naohiro Aota
again When btrfs makes a block group read-only, it adds all free regions in the block group to space_info->bytes_readonly. That free space excludes reserved and pinned regions. OTOH, when btrfs makes the block group read-write again, it moves all the unused regions into the block group's zone_unusable. That unused region includes reserved and pinned regions. As a result, it counts too much zone_unusable bytes. Fortunately (or unfortunately), having erroneous zone_unusable does not affect the calculation of space_info->bytes_readonly, because free space (num_bytes in btrfs_dec_block_group_ro) calculation is done based on the erroneous zone_unusable and it reduces the num_bytes just to cancel the error. This behavior can be easily discovered by adding a WARN_ON to check e.g, "bg->pinned > 0" in btrfs_dec_block_group_ro(), and running fstests test case like btrfs/282. Fix it by properly considering pinned and reserved in btrfs_dec_block_group_ro(). Also, add a WARN_ON and introduce btrfs_space_info_update_bytes_zone_unusable() to catch a similar mistake. Fixes: 169e0da91a21 ("btrfs: zoned: track unusable bytes for zones") CC: stable@vger.kernel.org # 5.15+ Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-07-29btrfs: do not subtract delalloc from avail bytesNaohiro Aota
The block group's avail bytes printed when dumping a space info subtract the delalloc_bytes. However, as shown in btrfs_add_reserved_bytes() and btrfs_free_reserved_bytes(), it is added or subtracted along with "reserved" for the delalloc case, which means the "delalloc_bytes" is a part of the "reserved" bytes. So, excluding it to calculate the avail space counts delalloc_bytes twice, which can lead to an invalid result. Fixes: e50b122b832b ("btrfs: print available space for a block group when dumping a space info") CC: stable@vger.kernel.org # 6.6+ Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>
2024-07-29btrfs: make cow_file_range_inline() honor locked_page on errorBoris Burkov
The btrfs buffered write path runs through __extent_writepage() which has some tricky return value handling for writepage_delalloc(). Specifically, when that returns 1, we exit, but for other return values we continue and end up calling btrfs_folio_end_all_writers(). If the folio has been unlocked (note that we check the PageLocked bit at the start of __extent_writepage()), this results in an assert panic like this one from syzbot: BTRFS: error (device loop0 state EAL) in free_log_tree:3267: errno=-5 IO failure BTRFS warning (device loop0 state EAL): Skipping commit of aborted transaction. BTRFS: error (device loop0 state EAL) in cleanup_transaction:2018: errno=-5 IO failure assertion failed: folio_test_locked(folio), in fs/btrfs/subpage.c:871 ------------[ cut here ]------------ kernel BUG at fs/btrfs/subpage.c:871! Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI CPU: 1 PID: 5090 Comm: syz-executor225 Not tainted 6.10.0-syzkaller-05505-gb1bc554e009e #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/27/2024 RIP: 0010:btrfs_folio_end_all_writers+0x55b/0x610 fs/btrfs/subpage.c:871 Code: e9 d3 fb ff ff e8 25 22 c2 fd 48 c7 c7 c0 3c 0e 8c 48 c7 c6 80 3d 0e 8c 48 c7 c2 60 3c 0e 8c b9 67 03 00 00 e8 66 47 ad 07 90 <0f> 0b e8 6e 45 b0 07 4c 89 ff be 08 00 00 00 e8 21 12 25 fe 4c 89 RSP: 0018:ffffc900033d72e0 EFLAGS: 00010246 RAX: 0000000000000045 RBX: 00fff0000000402c RCX: 663b7a08c50a0a00 RDX: 0000000000000000 RSI: 0000000080000000 RDI: 0000000000000000 RBP: ffffc900033d73b0 R08: ffffffff8176b98c R09: 1ffff9200067adfc R10: dffffc0000000000 R11: fffff5200067adfd R12: 0000000000000001 R13: dffffc0000000000 R14: 0000000000000000 R15: ffffea0001cbee80 FS: 0000000000000000(0000) GS:ffff8880b9500000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f5f076012f8 CR3: 000000000e134000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> __extent_writepage fs/btrfs/extent_io.c:1597 [inline] extent_write_cache_pages fs/btrfs/extent_io.c:2251 [inline] btrfs_writepages+0x14d7/0x2760 fs/btrfs/extent_io.c:2373 do_writepages+0x359/0x870 mm/page-writeback.c:2656 filemap_fdatawrite_wbc+0x125/0x180 mm/filemap.c:397 __filemap_fdatawrite_range mm/filemap.c:430 [inline] __filemap_fdatawrite mm/filemap.c:436 [inline] filemap_flush+0xdf/0x130 mm/filemap.c:463 btrfs_release_file+0x117/0x130 fs/btrfs/file.c:1547 __fput+0x24a/0x8a0 fs/file_table.c:422 task_work_run+0x24f/0x310 kernel/task_work.c:222 exit_task_work include/linux/task_work.h:40 [inline] do_exit+0xa2f/0x27f0 kernel/exit.c:877 do_group_exit+0x207/0x2c0 kernel/exit.c:1026 __do_sys_exit_group kernel/exit.c:1037 [inline] __se_sys_exit_group kernel/exit.c:1035 [inline] __x64_sys_exit_group+0x3f/0x40 kernel/exit.c:1035 x64_sys_call+0x2634/0x2640 arch/x86/include/generated/asm/syscalls_64.h:232 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f5f075b70c9 Code: Unable to access opcode bytes at 0x7f5f075b709f. I was hitting the same issue by doing hundreds of accelerated runs of generic/475, which also hits IO errors by design. I instrumented that reproducer with bpftrace and found that the undesirable folio_unlock was coming from the following callstack: folio_unlock+5 __process_pages_contig+475 cow_file_range_inline.constprop.0+230 cow_file_range+803 btrfs_run_delalloc_range+566 writepage_delalloc+332 __extent_writepage # inlined in my stacktrace, but I added it here extent_write_cache_pages+622 Looking at the bisected-to patch in the syzbot report, Josef realized that the logic of the cow_file_range_inline error path subtly changing. In the past, on error, it jumped to out_unlock in cow_file_range(), which honors the locked_page, so when we ultimately call folio_end_all_writers(), the folio of interest is still locked. After the change, we always unlocked ignoring the locked_page, on both success and error. On the success path, this all results in returning 1 to __extent_writepage(), which skips the folio_end_all_writers() call, which makes it OK to have unlocked. Fix the bug by wiring the locked_page into cow_file_range_inline() and only setting locked_page to NULL on success. Reported-by: syzbot+a14d8ac9af3a2a4fd0c8@syzkaller.appspotmail.com Fixes: 0586d0a89e77 ("btrfs: move extent bit and page cleanup into cow_file_range_inline") CC: stable@vger.kernel.org # 6.10+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>
2024-07-29x86/elf: Add a new FPU buffer layout info to x86 core filesVignesh Balasubramanian
Add a new .note section containing type, size, offset and flags of every xfeature that is present. This information will be used by debuggers to understand the XSAVE layout of the machine where the core file has been dumped, and to read XSAVE registers, especially during cross-platform debugging. The XSAVE layouts of modern AMD and Intel CPUs differ, especially since Memory Protection Keys and the AVX-512 features have been inculcated into the AMD CPUs. Since AMD never adopted (and hence never left room in the XSAVE layout for) the Intel MPX feature, tools like GDB had assumed a fixed XSAVE layout matching that of Intel (based on the XCR0 mask). Hence, core dumps from AMD CPUs didn't match the known size for the XCR0 mask. This resulted in GDB and other tools not being able to access the values of the AVX-512 and PKRU registers on AMD CPUs. To solve this, an interim solution has been accepted into GDB, and is already a part of GDB 14, see https://sourceware.org/pipermail/gdb-patches/2023-March/198081.html. But it depends on heuristics based on the total XSAVE register set size and the XCR0 mask to infer the layouts of the various register blocks for core dumps, and hence, is not a foolproof mechanism to determine the layout of the XSAVE area. Therefore, add a new core dump note in order to allow GDB/LLDB and other relevant tools to determine the layout of the XSAVE area of the machine where the corefile was dumped. The new core dump note (which is being proposed as a per-process .note section), NT_X86_XSAVE_LAYOUT (0x205) contains an array of structures. Each structure describes an individual extended feature containing offset, size and flags in this format: struct x86_xfeat_component { u32 type; u32 size; u32 offset; u32 flags; }; and in an independent manner, allowing for future extensions without depending on hw arch specifics like CPUID etc. [ bp: Massage commit message, zap trailing whitespace. ] Co-developed-by: Jini Susan George <jinisusan.george@amd.com> Signed-off-by: Jini Susan George <jinisusan.george@amd.com> Co-developed-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Vignesh Balasubramanian <vigbalas@amd.com> Link: https://lore.kernel.org/r/20240725161017.112111-2-vigbalas@amd.com
2024-07-29Merge drm/drm-next into drm-misc-nextThomas Zimmermann
Backmerging to get a late RC of v6.10 before moving into v6.11. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
2024-07-29xfs: convert comma to semicolonChen Ni
Replace a comma between expression statements by a semicolon. Fixes: 178b48d588ea ("xfs: remove the for_each_xbitmap_ helpers") Signed-off-by: Chen Ni <nichen@iscas.ac.cn> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-07-29xfs: convert comma to semicolonChen Ni
Replace a comma between expression statements by a semicolon. Signed-off-by: Chen Ni <nichen@iscas.ac.cn> Fixes: 8f4b980ee67f ("xfs: pass the attr value to put_listent when possible") Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-07-29xfs: remove unused parameter in macro XFS_DQUOT_LOGRESJulian Sun
In the macro definition of XFS_DQUOT_LOGRES, a parameter is accepted, but it is not used. Hence, it should be removed. This patch has only passed compilation test, but it should be fine. Signed-off-by: Julian Sun <sunjunchao2870@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-07-29xfs: fix file_path handling in tracepointsDarrick J. Wong
Since file_path() takes the output buffer as one of its arguments, we might as well have it format directly into the tracepoint's char array instead of wasting stack space. Fixes: 3934e8ebb7cc6 ("xfs: create a big array data structure") Fixes: 5076a6040ca16 ("xfs: support in-memory buffer cache targets") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202403290419.HPcyvqZu-lkp@intel.com/ Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-07-29xfs: allow SECURE namespace xattrs to use reserved block poolEric Sandeen
We got a report from the podman folks that selinux relabels that happen as part of their process were returning ENOSPC when the filesystem is completely full. This is because xattr changes reserve about 15 blocks for the worst case, but the common case is for selinux contexts to be the sole, in-inode xattr and consume no blocks. We already allow reserved space consumption for XFS_ATTR_ROOT for things such as ACLs, and SECURE namespace attributes are not so very different, so allow them to use the reserved space as well. Code-comment-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> V2: Remove local variable, add comment. V3: Add Dave's preferred comment V4: Spelling and comment beautification
2024-07-29xfs: fix a memory leakDarrick J. Wong
kmemleak reported that we don't free the parent pointer names here if we found corruption. Fixes: 0d29a20fbdba8 ("xfs: scrub parent pointers") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-07-28minmax: don't use max() in situations that want a C constant expressionLinus Torvalds
We only had a couple of array[] declarations, and changing them to just use 'MAX()' instead of 'max()' fixes the issue. This will allow us to simplify our min/max macros enormously, since they can now unconditionally use temporary variables to avoid using the argument values multiple times. Cc: David Laight <David.Laight@aculab.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-07-28Merge tag 'ubifs-for-linus-6.11-rc1-take2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs Pull UBI and UBIFS updates from Richard Weinberger: - Many fixes for power-cut issues by Zhihao Cheng - Another ubiblock error path fix - ubiblock section mismatch fix - Misc fixes all over the place * tag 'ubifs-for-linus-6.11-rc1-take2' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs: ubi: Fix ubi_init() ubiblock_exit() section mismatch ubifs: add check for crypto_shash_tfm_digest ubifs: Fix inconsistent inode size when powercut happens during appendant writing ubi: block: fix null-pointer-dereference in ubiblock_create() ubifs: fix kernel-doc warnings ubifs: correct UBIFS_DFS_DIR_LEN macro definition and improve code clarity mtd: ubi: Restore missing cleanup on ubi_init() failure path ubifs: dbg_orphan_check: Fix missed key type checking ubifs: Fix unattached inode when powercut happens in creating ubifs: Fix space leak when powercut happens in linking tmpfile ubifs: Move ui->data initialization after initializing security ubifs: Fix adding orphan entry twice for the same inode ubifs: Remove insert_dead_orphan from replaying orphan process Revert "ubifs: ubifs_symlink: Fix memleak of inode->i_link in error path" ubifs: Don't add xattr inode into orphan area ubifs: Fix unattached xattr inode if powercut happens after deleting mtd: ubi: avoid expensive do_div() on 32-bit machines mtd: ubi: make ubi_class constant ubi: eba: properly rollback inside self_check_eba
2024-07-28Merge tag 'unicode-next-6.11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode Pull unicode update from Gabriel Krisman Bertazi: "Two small fixes to silence the compiler and static analyzers tools from Ben Dooks and Jeff Johnson" * tag 'unicode-next-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode: unicode: add MODULE_DESCRIPTION() macros unicode: make utf8 test count static
2024-07-27Merge tag '6.11-rc-smb-client-fixes-part2' of ↵Linus Torvalds
git://git.samba.org/sfrench/cifs-2.6 Pull more smb client updates from Steve French: - fix for potential null pointer use in init cifs - additional dynamic trace points to improve debugging of some common scenarios - two SMB1 fixes (one addressing reconnect with POSIX extensions, one a mount parsing error) * tag '6.11-rc-smb-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6: smb3: add dynamic trace point for session setup key expired failures smb3: add four dynamic tracepoints for copy_file_range and reflink smb3: add dynamic tracepoint for reflink errors cifs: mount with "unix" mount option for SMB1 incorrectly handled cifs: fix reconnect with SMB1 UNIX Extensions cifs: fix potential null pointer use in destroy_workqueue in init_cifs error path
2024-07-27Merge tag 'vfs-6.11-rc1.fixes.3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: "This contains two fixes for this merge window: VFS: - I noticed that it is possible for a privileged user to mount most filesystems with a non-initial user namespace in sb->s_user_ns. When fsopen() is called in a non-init namespace the caller's namespace is recorded in fs_context->user_ns. If the returned file descriptor is then passed to a process privileged in init_user_ns, that process can call fsconfig(fd_fs, FSCONFIG_CMD_CREATE*), creating a new superblock with sb->s_user_ns set to the namespace of the process which called fsopen(). This is problematic as only filesystems that raise FS_USERNS_MOUNT are known to be able to support a non-initial s_user_ns. Others may suffer security issues, on-disk corruption or outright crash the kernel. Prevent that by restricting such delegation to filesystems that allow FS_USERNS_MOUNT. Note, that this delegation requires a privileged process to actually create the superblock so either the privileged process is cooperaing or someone must have tricked a privileged process into operating on a fscontext file descriptor whose origin it doesn't know (a stupid idea). The bug dates back to about 5 years afaict. Misc: - Fix hostfs parsing when the mount request comes in via the legacy mount api. In the legacy mount api hostfs allows to specify the host directory mount without any key. Restore that behavior" * tag 'vfs-6.11-rc1.fixes.3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: hostfs: fix the host directory parse when mounting. fs: don't allow non-init s_user_ns for filesystems without FS_USERNS_MOUNT
2024-07-27Merge tag 'mm-hotfixes-stable-2024-07-26-14-33' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc hotfixes from Andrew Morton: "11 hotfixes, 7 of which are cc:stable. 7 are MM, 4 are other" * tag 'mm-hotfixes-stable-2024-07-26-14-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: nilfs2: handle inconsistent state in nilfs_btnode_create_block() selftests/mm: skip test for non-LPA2 and non-LVA systems mm/page_alloc: fix pcp->count race between drain_pages_zone() vs __rmqueue_pcplist() mm: memcg: add cacheline padding after lruvec in mem_cgroup_per_node alloc_tag: outline and export free_reserved_page() decompress_bunzip2: fix rare decompression failure mm/huge_memory: avoid PMD-size page cache if needed mm: huge_memory: use !CONFIG_64BIT to relax huge page alignment on 32 bit machines mm: fix old/young bit handling in the faulting path dt-bindings: arm: update James Clark's email address MAINTAINERS: mailmap: update James Clark's email address
2024-07-27hostfs: fix the host directory parse when mounting.Hongbo Li
hostfs not keep the host directory when mounting. When the host directory is none (default), fc->source is used as the host root directory, and this is wrong. Here we use `parse_monolithic` to handle the old mount path for parsing the root directory. For new mount path, The `parse_param` is used for the host directory parse. Reported-and-tested-by: Maciej Żenczykowski <maze@google.com> Fixes: cd140ce9f611 ("hostfs: convert hostfs to use the new mount API") Link: https://lore.kernel.org/all/CANP3RGceNzwdb7w=vPf5=7BCid5HVQDmz1K5kC9JG42+HVAh_g@mail.gmail.com/ Cc: Christian Brauner <brauner@kernel.org> Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Link: https://lore.kernel.org/r/20240725065130.1821964-1-lihongbo22@huawei.com [brauner: minor fixes] Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-27fs: don't allow non-init s_user_ns for filesystems without FS_USERNS_MOUNTSeth Forshee (DigitalOcean)
Christian noticed that it is possible for a privileged user to mount most filesystems with a non-initial user namespace in sb->s_user_ns. When fsopen() is called in a non-init namespace the caller's namespace is recorded in fs_context->user_ns. If the returned file descriptor is then passed to a process priviliged in init_user_ns, that process can call fsconfig(fd_fs, FSCONFIG_CMD_CREATE), creating a new superblock with sb->s_user_ns set to the namespace of the process which called fsopen(). This is problematic. We cannot assume that any filesystem which does not set FS_USERNS_MOUNT has been written with a non-initial s_user_ns in mind, increasing the risk for bugs and security issues. Prevent this by returning EPERM from sget_fc() when FS_USERNS_MOUNT is not set for the filesystem and a non-initial user namespace will be used. sget() does not need to be updated as it always uses the user namespace of the current context, or the initial user namespace if SB_SUBMOUNT is set. Fixes: cb50b348c71f ("convenience helpers: vfs_get_super() and sget_fc()") Reported-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Seth Forshee (DigitalOcean) <sforshee@kernel.org> Link: https://lore.kernel.org/r/20240724-s_user_ns-fix-v1-1-895d07c94701@kernel.org Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-26nilfs2: handle inconsistent state in nilfs_btnode_create_block()Ryusuke Konishi
Syzbot reported that a buffer state inconsistency was detected in nilfs_btnode_create_block(), triggering a kernel bug. It is not appropriate to treat this inconsistency as a bug; it can occur if the argument block address (the buffer index of the newly created block) is a virtual block number and has been reallocated due to corruption of the bitmap used to manage its allocation state. So, modify nilfs_btnode_create_block() and its callers to treat it as a possible filesystem error, rather than triggering a kernel bug. Link: https://lkml.kernel.org/r/20240725052007.4562-1-konishi.ryusuke@gmail.com Fixes: a60be987d45d ("nilfs2: B-tree node cache") Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+89cc4f2324ed37988b60@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=89cc4f2324ed37988b60 Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-26smb3: add dynamic trace point for session setup key expired failuresSteve French
There are cases where services need to remount (or change their credentials files) when keys have expired, but it can be helpful to have a dynamic trace point to make it easier to notify the service to refresh the storage account key. Here is sample output, one from mount with bad password, one from a reconnect where the password has been changed or expired and reconnect fails (requiring remount with new storage account key) TASK-PID CPU# ||||| TIMESTAMP FUNCTION | | | ||||| | | mount.cifs-11362 [000] ..... 6000.241620: smb3_key_expired: rc=-13 user=testpassu conn_id=0x2 server=localhost addr=127.0.0.1:445 kworker/4:0-8458 [004] ..... 6044.892283: smb3_key_expired: rc=-13 user=testpassu conn_id=0x3 server=localhost addr=127.0.0.1:445 Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-07-26Merge tag 'ceph-for-6.11-rc1' of https://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph updates from Ilya Dryomov: "A small patchset to address bogus I/O errors and ultimately an assertion failure in the face of watch errors with -o exclusive mappings in RBD marked for stable and some assorted CephFS fixes" * tag 'ceph-for-6.11-rc1' of https://github.com/ceph/ceph-client: rbd: don't assume rbd_is_lock_owner() for exclusive mappings rbd: don't assume RBD_LOCK_STATE_LOCKED for exclusive mappings rbd: rename RBD_LOCK_STATE_RELEASING and releasing_wait ceph: fix incorrect kmalloc size of pagevec mempool ceph: periodically flush the cap releases ceph: convert comma to semicolon in __ceph_dentry_dir_lease_touch() ceph: use cap_wait_list only if debugfs is enabled
2024-07-26smb3: add four dynamic tracepoints for copy_file_range and reflinkSteve French
Add more dynamic tracepoints to help debug copy_file_range (copychunk) and clone_range ("duplicate extents"). These are tracepoints for entering the function and completing without error. For example: "trace-cmd record -e smb3_copychunk_enter -e smb3_copychunk_done" or "trace-cmd record -e smb3_clone_enter -e smb3_clone_done" Here is sample output: TASK-PID CPU# ||||| TIMESTAMP FUNCTION | | | ||||| | | cp-5964 [005] ..... 2176.168977: smb3_clone_enter: xid=17 sid=0xeb275be4 tid=0x7ffa7cdb source fid=0x1ed02e15 source offset=0x0 target fid=0x1ed02e15 target offset=0x0 len=0xa0000 cp-5964 [005] ..... 2176.170668: smb3_clone_done: xid=17 sid=0xeb275be4 tid=0x7ffa7cdb source fid=0x1ed02e15 source offset=0x0 target fid=0x1ed02e15 target offset=0x0 len=0xa0000 Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-07-26smb3: add dynamic tracepoint for reflink errorsSteve French
There are cases where debugging clone_range ("smb2_duplicate_extents" function) and in the future copy_range ("smb2_copychunk_range") can be helpful. Add dynamic trace points for any errors in clone, and a followon patch will add them for copychunk. "trace-cmd record -e smb3_clone_err" Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-07-26Merge tag 'erofs-for-6.11-rc1-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull more erofs updates from Gao Xiang: - Support STATX_DIOALIGN and FS_IOC_GETFSSYSFSPATH - Fix a race of LZ4 decompression due to recent refactoring - Another multi-page folio adaption in erofs_bread() * tag 'erofs-for-6.11-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: convert comma to semicolon erofs: support multi-page folios for erofs_bread() erofs: add support for FS_IOC_GETFSSYSFSPATH erofs: fix race in z_erofs_get_gbuf() erofs: support STATX_DIOALIGN
2024-07-26erofs: convert comma to semicolonChen Ni
Replace a comma between expression statements by a semicolon. Signed-off-by: Chen Ni <nichen@iscas.ac.cn> Link: https://lore.kernel.org/r/20240724020721.2389738-1-nichen@iscas.ac.cn Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>