summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-03-17fs/dax: don't skip locked entries when scanning entriesAlistair Popple
Several functions internal to FS DAX use the following pattern when trying to obtain an unlocked entry: xas_for_each(&xas, entry, end_idx) { if (dax_is_locked(entry)) entry = get_unlocked_entry(&xas, 0); This is problematic because get_unlocked_entry() will get the next present entry in the range, and the next entry may not be locked. Therefore any processing of the original locked entry will be skipped. This can cause dax_layout_busy_page_range() to miss DMA-busy pages in the range, leading file systems to free blocks whilst DMA operations are ongoing which can lead to file system corruption. Instead callers from within a xas_for_each() loop should be waiting for the current entry to be unlocked without advancing the XArray state so a new function is introduced to wait. Also while we are here rename get_unlocked_entry() to get_next_unlocked_entry() to make it clear that it may advance the iterator state. Link: https://lkml.kernel.org/r/b11b2baed7157dc900bf07a4571bf71b7cd82d97.1740713401.git-series.apopple@nvidia.com Signed-off-by: Alistair Popple <apopple@nvidia.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Alison Schofield <alison.schofield@intel.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Asahi Lina <lina@asahilina.net> Cc: Balbir Singh <balbirs@nvidia.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Chunyan Zhang <zhang.lyra@gmail.com> Cc: "Darrick J. Wong" <djwong@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: linmiaohe <linmiaohe@huawei.com> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Matthew Wilcow (Oracle) <willy@infradead.org> Cc: Michael "Camp Drill Sergeant" Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Ted Ts'o <tytso@mit.edu> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17fs/dax: return unmapped busy pages from dax_layout_busy_page_range()Alistair Popple
dax_layout_busy_page_range() is used by file systems to scan the DAX page-cache to unmap mapping pages from user-space and to determine if any pages in the given range are busy, either due to ongoing DMA or other get_user_pages() usage. Currently it checks to see the file mapping is mapped into user-space with mapping_mapped() and returns early if not, skipping the check for DMA busy pages. This is wrong as pages may still be undergoing DMA access even if they have subsequently been unmapped from user-space. Fix this by dropping the check for mapping_mapped(). Link: https://lkml.kernel.org/r/d85ce6c2d1400ff111ed7302d9eef223d0243c57.1740713401.git-series.apopple@nvidia.com Signed-off-by: Alistair Popple <apopple@nvidia.com> Suggested-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Balbir Singh <balbirs@nvidia.com> Tested-by: Alison Schofield <alison.schofield@intel.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Asahi Lina <lina@asahilina.net> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Chunyan Zhang <zhang.lyra@gmail.com> Cc: "Darrick J. Wong" <djwong@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: linmiaohe <linmiaohe@huawei.com> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Matthew Wilcow (Oracle) <willy@infradead.org> Cc: Michael "Camp Drill Sergeant" Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Ted Ts'o <tytso@mit.edu> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17fuse: fix dax truncate/punch_hole fault pathAlistair Popple
Patch series "fs/dax: Fix ZONE_DEVICE page reference counts", v9. Device and FS DAX pages have always maintained their own page reference counts without following the normal rules for page reference counting. In particular pages are considered free when the refcount hits one rather than zero and refcounts are not added when mapping the page. Tracking this requires special PTE bits (PTE_DEVMAP) and a secondary mechanism for allowing GUP to hold references on the page (see get_dev_pagemap). However there doesn't seem to be any reason why FS DAX pages need their own reference counting scheme. By treating the refcounts on these pages the same way as normal pages we can remove a lot of special checks. In particular pXd_trans_huge() becomes the same as pXd_leaf(), although I haven't made that change here. It also frees up a valuable SW define PTE bit on architectures that have devmap PTE bits defined. It also almost certainly allows further clean-up of the devmap managed functions, but I have left that as a future improvment. It also enables support for compound ZONE_DEVICE pages which is one of my primary motivators for doing this work. This patch (of 20): FS DAX requires file systems to call into the DAX layout prior to unlinking inodes to ensure there is no ongoing DMA or other remote access to the direct mapped page. The fuse file system implements fuse_dax_break_layouts() to do this which includes a comment indicating that passing dmap_end == 0 leads to unmapping of the whole file. However this is not true - passing dmap_end == 0 will not unmap anything before dmap_start, and further more dax_layout_busy_page_range() will not scan any of the range to see if there maybe ongoing DMA access to the range. Fix this by passing -1 for dmap_end to fuse_dax_break_layouts() which will invalidate the entire file range to dax_layout_busy_page_range(). Link: https://lkml.kernel.org/r/cover.8068ad144a7eea4a813670301f4d2a86a8e68ec4.1740713401.git-series.apopple@nvidia.com Link: https://lkml.kernel.org/r/f09a34b6c40032022e4ddee6fadb7cc676f08867.1740713401.git-series.apopple@nvidia.com Fixes: 6ae330cad6ef ("virtiofs: serialize truncate/punch_hole and dax fault path") Signed-off-by: Alistair Popple <apopple@nvidia.com> Co-developed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Balbir Singh <balbirs@nvidia.com> Tested-by: Alison Schofield <alison.schofield@intel.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Asahi Lina <lina@asahilina.net> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Chunyan Zhang <zhang.lyra@gmail.com> Cc: "Darrick J. Wong" <djwong@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: linmiaohe <linmiaohe@huawei.com> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Matthew Wilcow (Oracle) <willy@infradead.org> Cc: Michael "Camp Drill Sergeant" Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Ted Ts'o <tytso@mit.edu> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17dax: use folios more widely within DAXMatthew Wilcox (Oracle)
Convert from pfn to folio instead of page and use those folios throughout to avoid accesses to page->index and page->mapping. Link: https://lkml.kernel.org/r/20241216155408.8102-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Jane Chu <jane.chu@oracle.com> Cc: Dan Willaims <dan.j.williams@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Alistair Popple <apopple@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17dax: remove access to page->indexMatthew Wilcox (Oracle)
This looks like a complete mess (why are we setting page->index at page fault time?), but I no longer care about DAX, and there's no reason to let DAX hold us back from removing page->index. Link: https://lkml.kernel.org/r/20241216155408.8102-1-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Jane Chu <jane.chu@oracle.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-18ext4: clear DISCARD flag if device does not support discardDiangang Li
commit 79add3a3f795e ("ext4: notify when discard is not supported") noted that keeping the DISCARD flag is for possibility that the underlying device might change in future even without file system remount. However, this scenario has rarely occurred in practice on the device side. Even if it does occur, it can be resolved with remount. Clearing the DISCARD flag not only prevents confusion caused by mount options but also avoids sending unnecessary discard commands. Signed-off-by: Diangang Li <lidiangang@bytedance.com> Link: https://patch.msgid.link/20250311021310.669524-1-lidiangang@bytedance.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-03-18jbd2: remove jbd2_journal_unfile_buffer()Baokun Li
Since the function jbd2_journal_unfile_buffer() is no longer called anywhere after commit e5a120aeb57f ("jbd2: remove journal_head from descriptor buffers"), so let's remove it. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250306063240.157884-1-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-03-18ext4: reorder capability check lastChristian Göttsche
capable() calls refer to enabled LSMs whether to permit or deny the request. This is relevant in connection with SELinux, where a capability check results in a policy decision and by default a denial message on insufficient permission is issued. It can lead to three undesired cases: 1. A denial message is generated, even in case the operation was an unprivileged one and thus the syscall succeeded, creating noise. 2. To avoid the noise from 1. the policy writer adds a rule to ignore those denial messages, hiding future syscalls, where the task performs an actual privileged operation, leading to hidden limited functionality of that task. 3. To avoid the noise from 1. the policy writer adds a rule to permit the task the requested capability, while it does not need it, violating the principle of least privilege. Signed-off-by: Christian Göttsche <cgzones@googlemail.com> Reviewed-by: Serge Hallyn <serge@hallyn.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250302160657.127253-2-cgoettsche@seltendoof.de Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-03-18ext4: update the comment about mb_optimize_scanZizhi Wo
Commit 196e402adf2e ("ext4: improve cr 0 / cr 1 group scanning") introduces the sysfs control interface "mb_max_linear_groups" to address the problem that rotational devices performance degrades when the "mb_optimize_scan" feature is enabled, which may result in distant block group allocation. However, the name of the interface was incorrect in the comment to the ext4/mballoc.c file, and this patch fixes it, without further changes. Signed-off-by: Zizhi Wo <wozizhi@huawei.com> Reviewed-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250224012005.689549-1-wozizhi@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-03-18jbd2: fix off-by-one while erasing journalZhang Yi
In __jbd2_journal_erase(), the block_stop parameter includes the last block of a contiguous region; however, the calculation of byte_stop is incorrect, as it does not account for the bytes in that last block. Consequently, the page cache is not cleared properly, which occasionally causes the ext4/050 test to fail. Since block_stop operates on inclusion semantics, it involves repeated increments and decrements by 1, significantly increasing the complexity of the calculations. Optimize the calculation and fix the incorrect byte_stop by make both block_stop and byte_stop to use exclusion semantics. This fixes a failure in fstests ext4/050. Fixes: 01d5d96542fd ("ext4: add discard/zeroout flags to journal flush") Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250217065955.3829229-1-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-03-18ext4: remove references to bh->b_pageMatthew Wilcox (Oracle)
Buffer heads are attached to folios, not to pages. Also flush_dcache_page() is now deprecated in favour of flush_dcache_folio(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250213182303.2133205-1-willy@infradead.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-03-18ext4: goto right label 'out_mmap_sem' in ext4_setattr()Baokun Li
Otherwise, if ext4_inode_attach_jinode() fails, a hung task will happen because filemap_invalidate_unlock() isn't called to unlock mapping->invalidate_lock. Like this: EXT4-fs error (device sda) in ext4_setattr:5557: Out of memory INFO: task fsstress:374 blocked for more than 122 seconds. Not tainted 6.14.0-rc1-next-20250206-xfstests-dirty #726 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:fsstress state:D stack:0 pid:374 tgid:374 ppid:373 task_flags:0x440140 flags:0x00000000 Call Trace: <TASK> __schedule+0x2c9/0x7f0 schedule+0x27/0xa0 schedule_preempt_disabled+0x15/0x30 rwsem_down_read_slowpath+0x278/0x4c0 down_read+0x59/0xb0 page_cache_ra_unbounded+0x65/0x1b0 filemap_get_pages+0x124/0x3e0 filemap_read+0x114/0x3d0 vfs_read+0x297/0x360 ksys_read+0x6c/0xe0 do_syscall_64+0x4b/0x110 entry_SYSCALL_64_after_hwframe+0x76/0x7e Fixes: c7fc0366c656 ("ext4: partial zero eof block on unaligned inode size extension") Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Brian Foster <bfoster@redhat.com> Link: https://patch.msgid.link/20250213112247.3168709-1-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-03-18ext4: fix out-of-bound read in ext4_xattr_inode_dec_ref_all()Ye Bin
There's issue as follows: BUG: KASAN: use-after-free in ext4_xattr_inode_dec_ref_all+0x6ff/0x790 Read of size 4 at addr ffff88807b003000 by task syz-executor.0/15172 CPU: 3 PID: 15172 Comm: syz-executor.0 Call Trace: __dump_stack lib/dump_stack.c:82 [inline] dump_stack+0xbe/0xfd lib/dump_stack.c:123 print_address_description.constprop.0+0x1e/0x280 mm/kasan/report.c:400 __kasan_report.cold+0x6c/0x84 mm/kasan/report.c:560 kasan_report+0x3a/0x50 mm/kasan/report.c:585 ext4_xattr_inode_dec_ref_all+0x6ff/0x790 fs/ext4/xattr.c:1137 ext4_xattr_delete_inode+0x4c7/0xda0 fs/ext4/xattr.c:2896 ext4_evict_inode+0xb3b/0x1670 fs/ext4/inode.c:323 evict+0x39f/0x880 fs/inode.c:622 iput_final fs/inode.c:1746 [inline] iput fs/inode.c:1772 [inline] iput+0x525/0x6c0 fs/inode.c:1758 ext4_orphan_cleanup fs/ext4/super.c:3298 [inline] ext4_fill_super+0x8c57/0xba40 fs/ext4/super.c:5300 mount_bdev+0x355/0x410 fs/super.c:1446 legacy_get_tree+0xfe/0x220 fs/fs_context.c:611 vfs_get_tree+0x8d/0x2f0 fs/super.c:1576 do_new_mount fs/namespace.c:2983 [inline] path_mount+0x119a/0x1ad0 fs/namespace.c:3316 do_mount+0xfc/0x110 fs/namespace.c:3329 __do_sys_mount fs/namespace.c:3540 [inline] __se_sys_mount+0x219/0x2e0 fs/namespace.c:3514 do_syscall_64+0x33/0x40 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x67/0xd1 Memory state around the buggy address: ffff88807b002f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff88807b002f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffff88807b003000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ^ ffff88807b003080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff88807b003100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff Above issue happens as ext4_xattr_delete_inode() isn't check xattr is valid if xattr is in inode. To solve above issue call xattr_check_inode() check if xattr if valid in inode. In fact, we can directly verify in ext4_iget_extra_inode(), so that there is no divergent verification. Fixes: e50e5129f384 ("ext4: xattr-in-inode support") Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250208063141.1539283-3-yebin@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-03-18ext4: introduce ITAIL helperYe Bin
Introduce ITAIL helper to get the bound of xattr in inode. Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250208063141.1539283-2-yebin@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-03-17hypfs_create_cpu_files(): add missing check for hypfs_mkdir() failureAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-03-17scsi: st: Tighten the page format heuristics with MODE SELECTKai Mäkisara
In the days when SCSI-2 was emerging, some drives did claim SCSI-2 but did not correctly implement it. The st driver first tries MODE SELECT with the page format bit set to set the block descriptor. If not successful, the non-page format is tried. The test only tests the sense code and this triggers also from illegal parameter in the parameter list. The test is limited to "old" devices and made more strict to remove false alarms. Signed-off-by: Kai Mäkisara <Kai.Makisara@kolumbus.fi> Link: https://lore.kernel.org/r/20250311112516.5548-4-Kai.Makisara@kolumbus.fi Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-03-17scsi: st: ERASE does not change tape locationKai Mäkisara
The SCSI ERASE command erases from the current position onwards. Don't clear the position variables. Signed-off-by: Kai Mäkisara <Kai.Makisara@kolumbus.fi> Link: https://lore.kernel.org/r/20250311112516.5548-3-Kai.Makisara@kolumbus.fi Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-03-17scsi: st: Fix array overflow in st_setup()Kai Mäkisara
Change the array size to follow parms size instead of a fixed value. Reported-by: Chenyuan Yang <chenyuan0y@gmail.com> Closes: https://lore.kernel.org/linux-scsi/CALGdzuoubbra4xKOJcsyThdk5Y1BrAmZs==wbqjbkAgmKS39Aw@mail.gmail.com/ Signed-off-by: Kai Mäkisara <Kai.Makisara@kolumbus.fi> Link: https://lore.kernel.org/r/20250311112516.5548-2-Kai.Makisara@kolumbus.fi Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-03-17scsi: target: tcm_loop: Fix wrong abort tagGuixin Liu
When the tcm_loop_nr_hw_queues is set to a value greater than 1, the tags of requests in the block layer are no longer unique. This may lead to erroneous aborting of commands with the same tag. The issue can be resolved by using blk_mq_unique_tag to generate globally unique identifiers by combining the hardware queue index and per-queue tags. Fixes: 6375f8908255 ("tcm_loop: Fixup tag handling") Signed-off-by: Guixin Liu <kanie@linux.alibaba.com> Link: https://lore.kernel.org/r/20250313014728.105849-1-kanie@linux.alibaba.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-03-17scsi: lpfc: Restore clearing of NLP_UNREG_INP in ndlp->nlp_flagEwan D. Milne
Commit 32566a6f1ae5 ("scsi: lpfc: Remove NLP_RELEASE_RPI flag from nodelist structure") introduced a regression with SLI-3 adapters (e.g. LPe12000 8Gb) where a Link Down / Link Up such as caused by disabling an host FC switch port would result in the devices remaining in the transport-offline state and multipath reporting them as failed. This problem was not seen with newer SLI-4 adapters. The problem was caused by portions of the patch which removed the functions __lpfc_sli_rpi_release() and lpfc_sli_rpi_release() and all their callers. This was presumably because with the removal of the NLP_RELEASE_RPI flag there was no need to free the rpi. However, __lpfc_sli_rpi_release() and lpfc_sli_rpi_release() which calls it reset the NLP_UNREG_INP flag. And, lpfc_sli_def_mbox_cmpl() has a path where __lpfc_sli_rpi_release() was called in a particular case where NLP_UNREG_INP was not otherwise cleared because of other conditions. Restoring the else clause of this conditional and simply clearing the NLP_UNREG_INP flag appears to resolve the problem with SLI-3 adapters. It should be noted that the code path in question is not specific to SLI-3, but there are other SLI-4 code paths which may have masked the issue. Fixes: 32566a6f1ae5 ("scsi: lpfc: Remove NLP_RELEASE_RPI flag from nodelist structure") Cc: stable@vger.kernel.org Tested-by: Marco Patalano <mpatalan@redhat.com> Signed-off-by: Ewan D. Milne <emilne@redhat.com> Link: https://lore.kernel.org/r/20250317163731.356873-1-emilne@redhat.com Reviewed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-03-17scsi: hisi_sas: Fixed failure to issue vendor specific commandsXingui Yang
At present, we determine the protocol through the cmd type, but other cmd types, such as vendor-specific commands, default to the PIO protocol. This strategy often causes the execution of different vendor-specific commands to fail. In fact, for these commands, a better way is to use the protocol configured by the command's tf to determine its protocol. Fixes: 6f2ff1a1311e ("hisi_sas: add v2 path to send ATA command") Signed-off-by: Xingui Yang <yangxingui@huawei.com> Link: https://lore.kernel.org/r/20250220090011.313848-1-liyihang9@huawei.com Reviewed-by: Yihang Li <liyihang9@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-03-17scsi: fnic: Remove unnecessary NUL-terminationsThorsten Blum
strscpy_pad() already NUL-terminates 'data' at the corresponding indexes. Remove any unnecessary NUL-terminations. No functional changes intended. Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Link: https://lore.kernel.org/r/20250314221626.43174-2-thorsten.blum@linux.dev Reviewed-by: Karan Tilak Kumar <kartilak@cisco.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-03-17scsi: fnic: Remove redundant flush_workqueue() callsChen Ni
destroy_workqueue() already drains the queue before destroying it, so there is no need to flush it explicitly. Remove the redundant flush_workqueue() calls. This was generated with coccinelle: @@ expression E; @@ - flush_workqueue(E); destroy_workqueue(E); Signed-off-by: Chen Ni <nichen@iscas.ac.cn> Link: https://lore.kernel.org/r/20250312074320.1430175-1-nichen@iscas.ac.cn Reviewed-by: Karan Tilak Kumar <kartilak@cisco.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-03-18f2fs: fix missing discard for active segmentsChunhai Guo
During a checkpoint, the current active segment X may not be handled properly. This occurs when segment X has 0 valid blocks and a non-zero number of discard blocks, for the following reasons: locate_dirty_segment() does not mark any active segment as a prefree segment. As a result, segment X is not included in dirty_segmap[PRE], and f2fs_clear_prefree_segments() skips it when handling prefree segments. add_discard_addrs() skips any segment with 0 valid blocks, so segment X is also skipped. Consequently, no `struct discard_cmd` is actually created for segment X. However, the ckpt_valid_map and cur_valid_map of segment X are synced by seg_info_to_raw_sit() during the current checkpoint process. As a result, it cannot find the missing discard bits even in subsequent checkpoints. Consequently, the value of sbi->discard_blks remains non-zero. Thus, when f2fs is umounted, CP_TRIMMED_FLAG will not be set due to the non-zero sbi->discard_blks. Relevant code process: f2fs_write_checkpoint() f2fs_flush_sit_entries() list_for_each_entry_safe(ses, tmp, head, set_list) { for_each_set_bit_from(segno, bitmap, end) { ... add_discard_addrs(sbi, cpc, false); // skip segment X due to its 0 valid blocks ... seg_info_to_raw_sit(); // sync ckpt_valid_map with cur_valid_map for segment X ... } } f2fs_clear_prefree_segments(); // segment X is not included in dirty_segmap[PRE] and is skipped This issue is easy to reproduce with the following operations: root # mkfs.f2fs -f /dev/f2fs_dev root # mount -t f2fs /dev/f2fs_dev /mnt_point root # dd if=/dev/blk_dev of=/mnt_point/1.bin bs=4k count=256 root # sync root # rm /mnt_point/1.bin root # umount /mnt_point root # dump.f2fs /dev/f2fs_dev | grep "checkpoint state" Info: checkpoint state = 45 : crc compacted_summary unmount ---- 'trimmed' flag is missing Since add_discard_addrs() can handle active segments with non-zero valid blocks, it is reasonable to fix this issue by allowing it to also handle active segments with 0 valid blocks. Fixes: b29555505d81 ("f2fs: add key functions for small discards") Signed-off-by: Chunhai Guo <guochunhai@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-18f2fs: optimize f2fs DIO overwritesYohan Joung
this is unnecessary when we know we are overwriting already allocated blocks and the overhead of starting a transaction can be significant especially for multithreaded workloads doing small writes. Signed-off-by: Yohan Joung <yohan.joung@sk.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-17docs: driver-api: firmware: clarify userspace requirementsJacek Lawrynowicz
The guidelines mention that firmware updates can't break the kernel, but it doesn't state directly that they can't break userspace programs. Make it explicit that firmware updates cannot break UAPI. Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Acked-by: Dave Airlie <airlied@redhat.com> [jc: fixed "no trailing newline"] Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20250314100137.2972355-1-jacek.lawrynowicz@linux.intel.com
2025-03-17x86/fpu/xstate: Fix inconsistencies in guest FPU xfeaturesChao Gao
Guest FPUs manage vCPU FPU states. They are allocated via fpu_alloc_guest_fpstate() and are resized in fpstate_realloc() when XFD features are enabled. Since the introduction of guest FPUs, there have been inconsistencies in the kernel buffer size and xfeatures: 1. fpu_alloc_guest_fpstate() uses fpu_user_cfg since its introduction. See: 69f6ed1d14c6 ("x86/fpu: Provide infrastructure for KVM FPU cleanup") 36487e6228c4 ("x86/fpu: Prepare guest FPU for dynamically enabled FPU features") 2. __fpstate_reset() references fpu_kernel_cfg to set storage attributes. 3. fpu->guest_perm uses fpu_kernel_cfg, affecting fpstate_realloc(). A recent commit in the tip:x86/fpu tree partially addressed the inconsistency between (1) and (3) by using fpu_kernel_cfg for size calculation in (1), but left fpu_guest->xfeatures and fpu_guest->perm still referencing fpu_user_cfg: https://lore.kernel.org/all/20250218141045.85201-1-stanspas@amazon.de/ 1937e18cc3cf ("x86/fpu: Fix guest FPU state buffer allocation size") The inconsistencies within fpu_alloc_guest_fpstate() and across the mentioned functions cause confusion. Fix them by using fpu_kernel_cfg consistently in fpu_alloc_guest_fpstate(), except for fields related to the UABI buffer. Referencing fpu_kernel_cfg won't impact functionalities, as: 1. fpu_guest->perm is overwritten shortly in fpu_init_guest_permissions() with fpstate->guest_perm, which already uses fpu_kernel_cfg. 2. fpu_guest->xfeatures is solely used to check if XFD features are enabled. Including supervisor xfeatures doesn't affect the check. Fixes: 36487e6228c4 ("x86/fpu: Prepare guest FPU for dynamically enabled FPU features") Suggested-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Chao Gao <chao.gao@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Juergen Gross <jgross@suse.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: David Woodhouse <dwmw2@infradead.org> Link: https://lore.kernel.org/r/20250317140613.1761633-1-chao.gao@intel.com
2025-03-17docs: clarify rules wrt tagging other peopleThorsten Leemhuis
Point out that explicit permission is usually needed to tag other people in changes, but mention that implicit permission can be sufficient in certain cases. This fixes slight inconsistencies between Reported-by: and Suggested-by: and makes the usage more intuitive. While at it, explicitly mention the dangers of our bugzilla instance, as it makes it easy to forget that email addresses visible there are only shown to logged-in users. The latter is not a theoretical issue, as one maintainer mentioned that his employer received a EU GDPR (general data protection regulation) complaint after exposing a email address used in bugzilla through a tag in a patch description. Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Cc: Simona Vetter <simona.vetter@ffwll.ch> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Reviewed-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Thorsten Leemhuis <linux@leemhuis.info> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/588cf2763baa8fea1f4825f4eaa7023fe88bb6c1.1738852082.git.linux@leemhuis.info
2025-03-17i3c: master: svc: Fix i3c_master_get_free_addr return checkStanley Chu
The return value of i3c_master_get_free_addr is assigned to a variable with wrong type, so it can't be negative. Use a signed integer for the return value. If the value is negative, break the process and propagate the error code. This commit also fixes the uninitialized symbol 'dyn_addr', reported by Smatch static checker. Fixes: 4008a74e0f9b ("i3c: master: svc: Fix npcm845 FIFO empty issue") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/all/029e5ac0-5444-4a8e-bca4-cec55950d2b9@stanley.mountain/ Signed-off-by: Stanley Chu <yschu@nuvoton.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://lore.kernel.org/r/20250310023304.2335792-1-yschu@nuvoton.com Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2025-03-17docs: Remove outdated highuid.rst documentationkth
The highuid.rst document describes a transition that is outdated and no longer relevant. Additionally, it references filesystems (ncpfs and smbfs), which have been removed or replaced. Suggested-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Kang Taeho <kangtaeho2456@gmail.com> Link: https://lore.kernel.org/r/20250313145650.278346-1-kangtaeho2456@gmail.com Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2025-03-17perf/x86: Check data address for IBS software filterNamhyung Kim
The IBS software filter is filtering kernel samples for regular users in the PMI handler. It checks the instruction address in the IBS register to determine if it was in kernel mode or not. But it turns out that it's possible to report a kernel data address even if the instruction address belongs to user-space. Matteo Rizzo found that when an instruction raises an exception, IBS can report some kernel data addresses like IDT while holding the faulting instruction's RIP. To prevent an information leak, it should double check if the data address in PERF_SAMPLE_DATA is in the kernel space as well. [ mingo: Clarified the changelog ] Suggested-by: Matteo Rizzo <matteorizzo@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250317163755.1842589-1-namhyung@kernel.org
2025-03-17smb: client: don't retry IO on failed negprotos with soft mountsPaulo Alcantara
If @server->tcpStatus is set to CifsNeedReconnect after acquiring @ses->session_mutex in smb2_reconnect() or cifs_reconnect_tcon(), it means that a concurrent thread failed to negotiate, in which case the server is no longer responding to any SMB requests, so there is no point making the caller retry the IO by returning -EAGAIN. Fix this by returning -EHOSTDOWN to the callers on soft mounts. Cc: David Howells <dhowells@redhat.com> Reported-by: Jay Shin <jaeshin@redhat.com> Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-03-17rtc: cros-ec: Avoid a couple of -Wflex-array-member-not-at-end warningsGustavo A. R. Silva
Use the `DEFINE_RAW_FLEX()` helper for an on-stack definition of a flexible structure where the size of the flexible-array member is known at compile-time, and refactor the rest of the code, accordingly. So, with these changes, fix the following warning: drivers/rtc/rtc-cros-ec.c:62:40: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] drivers/rtc/rtc-cros-ec.c:40:40: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Tzung-Bi Shih <tzungbi@kernel.org> Link: https://lore.kernel.org/r/Z9PpPg06OK8ghNvm@kspp Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2025-03-17dt-bindings: rtc: pcf2127: Reference spi-peripheral-props.yamlFabio Estevam
PCF2127 is an SPI device, thus its binding should reference spi-peripheral-props.yaml. Add a reference to spi-peripheral-props.yaml to fix the following dt-schema warning: imx7d-flex-concentrator.dtb: rtc@0: 'spi-max-frequency' does not match any of the regexes: 'pinctrl-[0-9]+' Signed-off-by: Fabio Estevam <festevam@denx.de> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Link: https://lore.kernel.org/r/20250317120356.2195670-1-festevam@gmail.com Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2025-03-17rtc: rzn1: implement one-second accuracy for alarmsWolfram Sang
The hardware alarm only supports one-minute accuracy which is coarse and disables UIE usage. Use the 1-second interrupt to achieve per-second accuracy. It is activated once we hit the per-minute alarm. The new feature is optional. When there is no 1-second interrupt, old behaviour with per-minute accuracy is used as before. With this feature, all tests of 'rtctest' are successfully passed. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Link: https://lore.kernel.org/r/20250305101038.9933-2-wsa+renesas@sang-engineering.com Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2025-03-17rtc: pcf50633: RemoveDr. David Alan Gilbert
The pcf50633 was used as part of the OpenMoko devices but the support for its main chip was recently removed in: commit 61b7f8920b17 ("ARM: s3c: remove all s3c24xx support") See https://lore.kernel.org/all/Z8z236h4B5A6Ki3D@gallifrey/ Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Link: https://lore.kernel.org/r/20250311014959.743322-3-linux@treblig.org Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2025-03-17Merge tag 'soc-fixes-6.14-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull SoC fixes from Arnd Bergmann: "The majority of these last fixes are for devicetree files. These address two important regressions for the Qualcomm SMMU and the Raspberry Pi 4 USB controller, as well as a larger number of patches fixing minor mistakes in board specific files for Rockchips, i.MX, starfive and broadcom. The non-DT changes are - A fix for an old boot regression on Renesas shmobile chips - Another boot time regression for for the Qualcomm PDR SoC driver, among a few other Qualcomm firmware driver fixes for efivars and tzmem - Minor Kconfig fixes for davinci and OMAP1 - Minor code fixes for sparx5 reset controllers, OMAP memory controller, i.MX SCU, cpufreq and SoC drivers and a Hisilicon SoC driver - One more update to the Asahi maintainers, adding Neal Gompa as a reviewer" * tag 'soc-fixes-6.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (35 commits) ARM: davinci: da850: fix selecting ARCH_DAVINCI_DA8XX soc: hisilicon: kunpeng_hccs: Fix incorrect string assembly memory: omap-gpmc: drop no compatible check reset: mchp: sparx5: Fix for lan966x ARM: shmobile: smp: Enforce shmobile_smp_* alignment MAINTAINERS: Add myself (Neal Gompa) as a reviewer for ARM Apple support MAINTAINERS: Add apple-spi driver & binding files arm64: dts: rockchip: slow down emmc freq for rock 5 itx ARM: dts: BCM5301X: Fix switch port labels of ASUS RT-AC3200 ARM: dts: BCM5301X: Fix switch port labels of ASUS RT-AC5300 ARM: dts: bcm2711: Don't mark timer regs unconfigured ARM: OMAP1: select CONFIG_GENERIC_IRQ_CHIP arm64: dts: rockchip: Add missing PCIe supplies to RockPro64 board dtsi arm64: dts: rockchip: Add avdd HDMI supplies to RockPro64 board dtsi arm64: dts: rockchip: Remove undocumented sdmmc property from lubancat-1 arm64: dts: rockchip: fix pinmux of UART5 for PX30 Ringneck on Haikou arm64: dts: rockchip: fix pinmux of UART0 for PX30 Ringneck on Haikou arm64: dts: rockchip: fix u2phy1_host status for NanoPi R4S arm64: dts: bcm2712: PL011 UARTs are actually r1p5 ARM: dts: bcm2711: PL011 UARTs are actually r1p5 ...
2025-03-17Merge tag 'probes-fixes-v6.14-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull probes fixes from Masami Hiramatsu: - Clean up tprobe correctly when module unload Tracepoint probes do not set TRACEPOINT_STUB on the 'tpoint' pointer when unloading a module, thus they show as a normal 'fprobe' instead of 'tprobe' and never come back - Fix leakage of tprobe module refcount When a tprobe's target module is loaded, it gets the module's refcount in the module notifier but forgot to put it after registering the probe on it. Fix it by getting the refcount only when registering tprobe. * tag 'probes-fixes-v6.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: tprobe-events: Fix leakage of module refcount tracing: tprobe-events: Fix to clean up tprobe correctly when module unload
2025-03-17vfio/pci: Handle INTx IRQ_NOTCONNECTEDAlex Williamson
Some systems report INTx as not routed by setting pdev->irq to IRQ_NOTCONNECTED, resulting in a -ENOTCONN error when trying to setup eventfd signaling. Include this in the set of conditions for which the PIN register is virtualized to zero. Additionally consolidate vfio_pci_get_irq_count() to use this virtualized value in reporting INTx support via ioctl and sanity checking ioctl paths since pdev->irq is re-used when the device is in MSI mode. The combination of these results in both the config space of the device and the ioctl interface behaving as if the device does not support INTx. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20250311230623.1264283-1-alex.williamson@redhat.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2025-03-17ASoC: Convert to modern PM macrosMark Brown
Merge series from Takashi Iwai <tiwai@suse.de>: This is a revised series of small and trivial patches to convert to the newer PM macros, e.g. from SET_RUNTIME_PM_OPS() to RUNTIME_PM_OPS(). The conversions are systematic, and we could reduce messy __maybe_unused and ifdefs with those changes. Merely code refactoring, and shouldn't change the actual driver behavior.
2025-03-17NFS: Refactor trace_nfs4_offload_cancelChuck Lever
Add a trace_nfs4_offload_status trace point that looks just like trace_nfs4_offload_cancel. Promote that event to an event class to avoid duplicating code. An alternative approach would be to expand trace_nfs4_offload_status to report more of the actual OFFLOAD_STATUS result. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Link: https://lore.kernel.org/r/20250113153235.48706-16-cel@kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-03-17NFS: Use NFSv4.2's OFFLOAD_STATUS operationChuck Lever
We've found that there are cases where a transport disconnection results in the loss of callback RPCs. NFS servers typically do not retransmit callback operations after a disconnect. This can be a problem for the Linux NFS client's current implementation of asynchronous COPY, which waits indefinitely for a CB_OFFLOAD callback. If a transport disconnect occurs while an async COPY is running, there's a good chance the client will never get the completing CB_OFFLOAD. Fix this by implementing the OFFLOAD_STATUS operation so that the Linux NFS client can probe the NFS server if it doesn't see a CB_OFFLOAD in a reasonable amount of time. This patch implements a simplistic check. As future work, the client might also be able to detect whether there is no forward progress on the request asynchronous COPY operation, and CANCEL it. Suggested-by: Olga Kornievskaia <kolga@netapp.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=218735 Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Link: https://lore.kernel.org/r/20250113153235.48706-15-cel@kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-03-17NFS: Implement NFSv4.2's OFFLOAD_STATUS operationChuck Lever
Enable the Linux NFS client to observe the progress of an offloaded asynchronous COPY operation. This new operation will be put to use in a subsequent patch. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Link: https://lore.kernel.org/r/20250113153235.48706-14-cel@kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-03-17NFS: Implement NFSv4.2's OFFLOAD_STATUS XDRChuck Lever
Add XDR encoding and decoding functions for the NFSv4.2 OFFLOAD_STATUS operation. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Link: https://lore.kernel.org/r/20250113153235.48706-13-cel@kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-03-17NFS: fix open_owner_id_maxsz and related fields.NeilBrown
A recent change increased the size of an NFSv4 open owner, but didn't increase the corresponding max_sz defines. This is not know to have caused failure, but should be fixed. This patch also fixes some relates _maxsz fields that are wrong. Note that the XXX_owner_id_maxsz values now are only the size of the id and do NOT include the len field that will always preceed the id in xdr encoding. I think this is clearer. Reported-by: David Disseldorp <ddiss@suse.com> Fixes: d98f72272500 ("nfs: simplify and guarantee owner uniqueness.") Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-03-17NFSv4: Avoid unnecessary scans of filesystems for delayed delegationsTrond Myklebust
The amount of looping through the list of delegations is occasionally leading to soft lockups. If the state manager was asked to manage the delayed return of delegations, then only scan those filesystems containing delegations that were marked as being delayed. Fixes: be20037725d1 ("NFSv4: Fix delegation return in cases where we have to retry") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-03-17NFSv4: Avoid unnecessary scans of filesystems for expired delegationsTrond Myklebust
The amount of looping through the list of delegations is occasionally leading to soft lockups. If the state manager was asked to reap the expired delegations, it should scan only those filesystems that hold delegations that need to be reaped. Fixes: 7f156ef0bf45 ("NFSv4: Clean up nfs_delegation_reap_expired()") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-03-17NFSv4: Avoid unnecessary scans of filesystems for returning delegationsTrond Myklebust
The amount of looping through the list of delegations is occasionally leading to soft lockups. If the state manager was asked to return delegations asynchronously, it should only scan those filesystems that hold delegations that need to be returned. Fixes: af3b61bf6131 ("NFSv4: Clean up nfs_client_return_marked_delegations()") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-03-17NFSv4: Don't trigger uneccessary scans for return-on-close delegationsTrond Myklebust
The amount of looping through the list of delegations is occasionally leading to soft lockups. Avoid at least some loops by not requiring the NFSv4 state manager to scan for delegations that are marked for return-on-close. Instead, either mark them for immediate return (if possible) or else leave it up to nfs4_inode_return_delegation_on_close() to return them once the file is closed by the application. Fixes: b757144fd77c ("NFSv4: Be less aggressive about returning delegations for open files") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-03-17Merge branch 'bpftool-using-the-right-format-specifiers'Andrii Nakryiko
Jiayuan Chen says: ==================== bpftool: Using the right format specifiers This patch adds the -Wformat-signedness compiler flag to detect and prevent format string errors, where signed or unsigned types are mismatched with format specifiers. Additionally, it fixes some format string errors that were not fully addressed by the previous patch [1]. [1] https://lore.kernel.org/bpf/20250207123706.727928-1-mrpre@163.com/T/#u --- v1->v2: https://lore.kernel.org/bpf/20250310142037.45932-1-jiayuan.chen@linux.dev/ --- ==================== Link: https://patch.msgid.link/20250311112809.81901-1-jiayuan.chen@linux.dev Signed-off-by: Andrii Nakryiko <andrii@kernel.org>