summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2024-01-04NFSv4.1/pnfs: Ensure we handle the error NFS4ERR_RETURNCONFLICTTrond Myklebust
Once the client has processed the CB_LAYOUTRECALL, but has not yet successfully returned the layout, the server is supposed to switch to returning NFS4ERR_RETURNCONFLICT. This patch ensures that we handle that return value correctly. Fixes: 183d9e7b112a ("pnfs: rework LAYOUTGET retry handling") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2024-01-04NFSv4.1: if referring calls are complete, trust the stateid argumentTrond Myklebust
If the server is recalling a layout, and sends us a list of referring calls that we can see are complete, then we should just trust that the stateid argument is correct, even if the sequence id doesn't match the one we hold. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2024-01-04NFSv4: Track the number of referring calls in struct cb_process_stateTrond Myklebust
When the server gives us a set of referring calls, to tell us that the NFSv4.1 callback needs to be ordered with respect to those calls, then we may want to make that information available to the operations. In certain cases, it may allow them to optimise their behaviour due to the extra knowledge. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2024-01-04NFS: Use parent's objective cred in nfs_access_login_time()Scott Mayhew
The subjective cred (task->cred) can potentially be overridden and subsquently freed in non-RCU context, which could lead to a panic if we try to use it in cred_fscmp(). Use __task_cred(), which returns the objective cred (task->real_cred) instead. Fixes: 0eb43812c027 ("NFS: Clear the file access cache upon login") Fixes: 5e9a7b9c2ea1 ("NFS: Fix up a sparse warning") Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2024-01-04NFSv4: Always ask for type with READDIRBenjamin Coddington
Again we have claimed regressions for walking a directory tree, this time with the "find" utility which always tries to optimize away asking for any attributes until it has a complete list of entries. This behavior makes the readdir plus heuristic do the wrong thing, which causes a storm of GETATTRs to determine each entry's type in order to continue the walk. For v4 add the type attribute to each READDIR request to include it no matter the heuristic. This allows a simple `find` command to proceed quickly through a directory tree. Suggested-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2024-01-04pnfs/blocklayout: Don't add zero-length pnfs_block_devBenjamin Coddington
We noticed a SCSI device that refused to allow READ CAPACITY when the device had a PR with exclusive access, registrants only. The result of this situation is that the blocklayout driver adds a pnfs_block_dev of zero length which always fails the offset_in_map tests. Instead of continuously trying to do pNFS for this case, just mark the device as unavailable which will allow the client to fallback to the MDS for the duration of PNFS_DEVICE_RETRY_TIMEOUT. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2024-01-04blocklayoutdriver: Fix reference leak of pnfs_device_nodeBenjamin Coddington
The error path for blocklayout's device lookup is missing a reference drop for the case where a lookup finds the device, but the device is marked with NFS_DEVICEID_UNAVAILABLE. Fixes: b3dce6a2f060 ("pnfs/blocklayout: handle transient devices") Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2024-01-04Merge branches 'acpi-pm', 'acpi-video', 'acpi-apei' and 'acpi-extlog'Rafael J. Wysocki
Merge an ACPI power management change, ACPI backlight driver changes, APEI updates and ACPI extlog driver changes for 6.8-rc1: - Modify the ACPI LPIT table handling code to avoid u32 multiplication overflows in state residency computations (Nikita Kiryushin). - Drop an unused helper function from the ACPI backlight (video) driver and add a clarifying comment to it (Hans de Goede). - Update the ACPI backlight driver to avoid using uninitialized memory in some cases (Nikita Kiryushin). - Add ACPI backlight quirk for the Colorful X15 AT 23 laptop (Yuluo Qiu). - Add support for vendor-defined error types to the ACPI APEI error injection code (Avadhut Naik). - Adjust APEI to properly set MF_ACTION_REQUIRED on synchronous memory failure events, so they are handled differently from the asynchronous ones (Shuai Xue). - Fix NULL pointer dereference check in the ACPI extlog driver (Prarit Bhargava). - Adjust the ACPI extlog driver to clear the Extended Error Log status when RAS_CEC handled the error (Tony Luck). * acpi-pm: ACPI: LPIT: Avoid u32 multiplication overflow * acpi-video: ACPI: video: Add quirk for the Colorful X15 AT 23 Laptop ACPI: video: check for error while searching for backlight device parent ACPI: video: Drop should_check_lcd_flag() ACPI: video: Add comment about acpi_video_backlight_use_native() usage * acpi-apei: ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on synchronous events ACPI: APEI: EINJ: Add support for vendor defined error types platform/chrome: cros_ec_debugfs: Fix permissions for panicinfo fs: debugfs: Add write functionality to debugfs blobs ACPI: APEI: EINJ: Refactor available_error_type_show() * acpi-extlog: ACPI: extlog: Clear Extended Error Log status when RAS_CEC handled the error ACPI: extlog: fix NULL pointer dereference check
2024-01-03Merge tag 'trace-v6.7-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Fix a NULL kernel dereference in set_gid() on tracefs mounting. When tracefs is mounted with "gid=1000", it will update the existing dentries to have the new gid. The tracefs_inode which is retrieved by a container_of(dentry->d_inode) has flags to see if the inode belongs to the eventfs system. The issue that was fixed was if getdents() was called on tracefs that was previously mounted, and was not closed. It will leave a "cursor dentry" in the subdirs list of the current dentries that set_gid() walks. On a remount of tracefs, the container_of(dentry->d_inode) will dereference a NULL pointer and cause a crash when referenced. Simply have a check for dentry->d_inode to see if it is NULL and if so, skip that entry. - Fix the bits of the eventfs_inode structure. The "is_events" bit was taken from the nr_entries field, but the nr_entries field wasn't updated to be 30 bits and was still 31. Including the "is_freed" bit this would use 33 bits which would make the structure use another integer for just one bit. * tag 'trace-v6.7-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: eventfs: Fix bitwise fields for "is_events" tracefs: Check for dentry->d_inode exists in set_gid()
2024-01-02eventfs: Fix bitwise fields for "is_events"Steven Rostedt (Google)
A flag was needed to denote which eventfs_inode was the "events" directory, so a bit was taken from the "nr_entries" field, as there's not that many entries, and 2^30 is plenty. But the bit number for nr_entries was not updated to reflect the bit taken from it, which would add an unnecessary integer to the structure. Link: https://lore.kernel.org/linux-trace-kernel/20240102151832.7ca87275@gandalf.local.home Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Fixes: 7e8358edf503e ("eventfs: Fix file and directory uid and gid ownership") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-01-02tracefs: Check for dentry->d_inode exists in set_gid()Steven Rostedt (Google)
If a getdents() is called on the tracefs directory but does not get all the files, it can leave a "cursor" dentry in the d_subdirs list of tracefs dentry. This cursor dentry does not have a d_inode for it. Before referencing tracefs_inode from the dentry, the d_inode must first be checked if it has content. If not, then it's not a tracefs_inode and can be ignored. The following caused a crash: #define getdents64(fd, dirp, count) syscall(SYS_getdents64, fd, dirp, count) #define BUF_SIZE 256 #define TDIR "/tmp/file0" int main(void) { char buf[BUF_SIZE]; int fd; int n; mkdir(TDIR, 0777); mount(NULL, TDIR, "tracefs", 0, NULL); fd = openat(AT_FDCWD, TDIR, O_RDONLY); n = getdents64(fd, buf, BUF_SIZE); ret = mount(NULL, TDIR, NULL, MS_NOSUID|MS_REMOUNT|MS_RELATIME|MS_LAZYTIME, "gid=1000"); return 0; } That's because the 256 BUF_SIZE was not big enough to read all the dentries of the tracefs file system and it left a "cursor" dentry in the subdirs of the tracefs root inode. Then on remounting with "gid=1000", it would cause an iteration of all dentries which hit: ti = get_tracefs(dentry->d_inode); if (ti && (ti->flags & TRACEFS_EVENT_INODE)) eventfs_update_gid(dentry, gid); Which crashed because of the dereference of the cursor dentry which had a NULL d_inode. In the subdir loop of the dentry lookup of set_gid(), if a child has a NULL d_inode, simply skip it. Link: https://lore.kernel.org/all/20240102135637.3a21fb10@gandalf.local.home/ Link: https://lore.kernel.org/linux-trace-kernel/20240102151249.05da244d@gandalf.local.home Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Fixes: 7e8358edf503e ("eventfs: Fix file and directory uid and gid ownership") Reported-by: "Ubisectech Sirius" <bugreport@ubisectech.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-01-02jfs: Add missing set_freezable() for freezable kthreadKevin Hao
The kernel thread function jfs_lazycommit() and jfs_sync() invoke the try_to_freeze() in its loop. But all the kernel threads are no-freezable by default. So if we want to make a kernel thread to be freezable, we have to invoke set_freezable() explicitly. Signed-off-by: Kevin Hao <haokexin@gmail.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2024-01-02jfs: fix array-index-out-of-bounds in diNewExtEdward Adam Davis
[Syz report] UBSAN: array-index-out-of-bounds in fs/jfs/jfs_imap.c:2360:2 index -878706688 is out of range for type 'struct iagctl[128]' CPU: 1 PID: 5065 Comm: syz-executor282 Not tainted 6.7.0-rc4-syzkaller-00009-gbee0e7762ad2 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/10/2023 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106 ubsan_epilogue lib/ubsan.c:217 [inline] __ubsan_handle_out_of_bounds+0x11c/0x150 lib/ubsan.c:348 diNewExt+0x3cf3/0x4000 fs/jfs/jfs_imap.c:2360 diAllocExt fs/jfs/jfs_imap.c:1949 [inline] diAllocAG+0xbe8/0x1e50 fs/jfs/jfs_imap.c:1666 diAlloc+0x1d3/0x1760 fs/jfs/jfs_imap.c:1587 ialloc+0x8f/0x900 fs/jfs/jfs_inode.c:56 jfs_mkdir+0x1c5/0xb90 fs/jfs/namei.c:225 vfs_mkdir+0x2f1/0x4b0 fs/namei.c:4106 do_mkdirat+0x264/0x3a0 fs/namei.c:4129 __do_sys_mkdir fs/namei.c:4149 [inline] __se_sys_mkdir fs/namei.c:4147 [inline] __x64_sys_mkdir+0x6e/0x80 fs/namei.c:4147 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x45/0x110 arch/x86/entry/common.c:82 entry_SYSCALL_64_after_hwframe+0x63/0x6b RIP: 0033:0x7fcb7e6a0b57 Code: ff ff 77 07 31 c0 c3 0f 1f 40 00 48 c7 c2 b8 ff ff ff f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 b8 53 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffd83023038 EFLAGS: 00000286 ORIG_RAX: 0000000000000053 RAX: ffffffffffffffda RBX: 00000000ffffffff RCX: 00007fcb7e6a0b57 RDX: 00000000000a1020 RSI: 00000000000001ff RDI: 0000000020000140 RBP: 0000000020000140 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000286 R12: 00007ffd830230d0 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [Analysis] When the agstart is too large, it can cause agno overflow. [Fix] After obtaining agno, if the value is invalid, exit the subsequent process. Reported-and-tested-by: syzbot+553d90297e6d2f50dbc7@syzkaller.appspotmail.com Signed-off-by: Edward Adam Davis <eadavis@qq.com> Modified the test from agno > MAXAG to agno >= MAXAG based on linux-next report by kernel test robot (Dan Carpenter). Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2024-01-01bcachefs: check_directory_structure() can now be run onlineKent Overstreet
Now that we have dynamically resizable btree paths, check_directory_structure() can check one path - inode up to the root - in a single transaction. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Fix reattach_inode() for snapshotsKent Overstreet
reattach_inode() was broken w.r.t. snapshots - we'd lookup the subvolume to look up lost+found, but if we're in an interior node snapshot that didn't make any sense. Instead, this adds a dirent path for creating in a specific snapshot, skipping the subvolume; and we also make sure to create lost+found in the root snapshot, to avoid conflicts with lost+found being created in overlapping snapshots. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: bch2_btree_trans_peek_slot_updatesKent Overstreet
refactoring the BTREE_ITER_WITH_UPDATES code, prep for removing the flag and making it always-on Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: bch2_btree_trans_peek_prev_updatesKent Overstreet
bch2_btree_iter_peek_prev() now supports BTREE_ITER_WITH_UPDATES Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: bch2_btree_trans_peek_updatesKent Overstreet
refactoring the BTREE_ITER_WITH_UPDATES code, prep for removing the flag and making it always-on Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: growable btree_pathsKent Overstreet
XXX: we're allocating memory with btree locks held - bad We need to plumb through an error path so we can do allocate_dropping_locks() - but we're merging this now because it fixes a transaction path overflow caused by indirect extent fragmentation, and the resize path is rare. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Fix interior update path btree_path usesKent Overstreet
Since the btree_paths array is now about to become growable, we have to be careful not to refer to paths by pointer across contexts where they may be reallocated. This fixes the remaining btree_interior_update() paths - split and merge. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: trans->nr_pathsKent Overstreet
Start to plumb through dynamically growable btree_paths; this patch replaces most BTREE_ITER_MAX references with trans->nr_paths. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: trans->updates will also be resizableKent Overstreet
the reflink triggers are also bumping up against the maximum number of paths in a transaction - and generating proportional numbers of updates. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: optimize __bch2_trans_get(), kill DEBUG_TRANSACTIONSKent Overstreet
- Some tweaks to greatly reduce locking overhead for the list of btree transactions, so that it can always be enabled: leave btree_trans objects on the list when they're on the percpu single item freelist, and only check for duplicates in the same process when CONFIG_BCACHEFS_DEBUG is enabled - don't zero out the full btree_trans() unless we allocated it from the mempool Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: rcu protect trans->pathsKent Overstreet
Upcoming patches are going to be changing trans->paths to a reallocatable buffer. We need to guard against use after free when it's used by other threads; this introduces RCU protection to those paths and changes them to check for trans->paths == NULL Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Clean up btree_transKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: kill btree_path.idxKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: get_unlocked_mut_path() -> btree_path_idx_tKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: bch2_btree_iter_peek_prev() no longer uses path->idxKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: bch2_path_get() no longer uses path->idxKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: trans_for_each_path_with_node() no longer uses path->idxKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: trans_for_each_path() no longer uses path->idxKent Overstreet
path->idx is now a code smell: we should be using path_idx_t, since it's stable across btree path reallocation. This is also a bit faster, using the same loop counter vs. fetching path->idx from each path we iterate over. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: kill trans_for_each_path_from()Kent Overstreet
dead code Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: bch2_btree_path_to_text() -> btree_path_idx_tKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: struct trans_for_each_path_inorder_iterKent Overstreet
reducing our usage of path->idx Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: btree_insert_entry -> btree_path_idx_tKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: btree_iter -> btree_path_idx_tKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: btree_path_alloc() -> btree_path_idx_tKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: bch2_btree_path_traverse() -> btree_path_idx_tKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: bch2_btree_path_make_mut() -> btree_path_idx_tKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: bch2_btree_path_set_pos() -> btree_path_idx_tKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs; bch2_path_put() -> btree_path_idx_tKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: bch2_path_get() -> btree_path_idx_tKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: minor bch2_btree_path_set_pos() optimizationKent Overstreet
bpos_eq() is cheaper than bpos_cmp() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Kill GFP_NOFAIL usage in readahead pathKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Convert split_devs() to darrayKent Overstreet
Bit of cleanup & modernization: also moving this code to util.c, it'll be used by userspace as well. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: skip journal more often in key cache reclaimKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: for_each_keylist_key() declares loop iterKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: bkey_for_each_ptr() now declares loop iterKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: kill __bch2_btree_iter_peek_upto_and_restart()Kent Overstreet
dead code Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: fsck -> bch2_trans_run()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>