summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2024-01-04ext4: clarify handling of unwritten bh in __ext4_block_zero_page_range()Ojaswin Mujoo
As an optimization, I was trying to work on exiting early from this function if dealing with unwritten extent since they anyways read 0. However, it was realised that there are certain code paths that can end up calling ext4_block_zero_page_range() for an unwritten bh that might still have data in pagecache. In this case, we can't exit early and we do require to process the bh and zero out the pagecache to ensure that a writeback can't kick in at a later time and flush the stale pagecache to disk. Since, adding the logic to exit early for unwritten bh was turning out to be much more nuanced and the current code already handles it well, just add a comment to explicitly document this behavior. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/d859b7ae5fe42e6626479b91ed9f4da3aae4c597.1698856309.git.ojaswin@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-01-04ext4: treat end of range as exclusive in ext4_zero_range()Ojaswin Mujoo
The call to filemap_write_and_wait_range() assumes the range passed to be inclusive, so fix the call to make sure we follow that. Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/e503107a7c73a2b68dec645c5ad798c437717c45.1698856309.git.ojaswin@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-01-04ext4: enable dioread_nolock as default for bs < ps caseOjaswin Mujoo
dioread_nolock was originally disabled as a default option for bs < ps scenarios due to a data corruption issue. Since then, we've had some fixes in this area which address such issues. Enable dioread_nolock by default and remove the experimental warning message for bs < ps path. dioread for bs < ps has been tested on a 64k pagesize machine using: kvm-xfstest -C 3 -g auto with the following configs: 64k adv bigalloc_4k bigalloc_64k data_journal encrypt dioread_nolock dioread_nolock_4k ext3 ext3conv nojournal And no new regressions were seen compared to baseline kernel. Suggested-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/20231101154717.531865-1-ojaswin@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-01-04ext4: delete redundant calculations in ext4_mb_get_buddy_page_lock()Gou Hao
'blocks_per_page' is always 1 after 'if (blocks_per_page >= 2)', 'pnum' and 'block' are equal in this case. Signed-off-by: Gou Hao <gouhao@uniontech.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20231024035215.29474-1-gouhao@uniontech.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-01-04nfsd: drop the nfsd_put helperJeff Layton
It's not safe to call nfsd_put once nfsd_last_thread has been called, as that function will zero out the nn->nfsd_serv pointer. Drop the nfsd_put helper altogether and open-code the svc_put in its callers instead. That allows us to not be reliant on the value of that pointer when handling an error. Fixes: 2a501f55cd64 ("nfsd: call nfsd_last_thread() before final nfsd_put()") Reported-by: Zhi Li <yieli@redhat.com> Cc: NeilBrown <neilb@suse.de> Signed-off-by: Jeffrey Layton <jlayton@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. Conflicts: drivers/net/ethernet/broadcom/bnxt/bnxt.c e009b2efb7a8 ("bnxt_en: Remove mis-applied code from bnxt_cfg_ntp_filters()") 0f2b21477988 ("bnxt_en: Fix compile error without CONFIG_RFS_ACCEL") https://lore.kernel.org/all/20240105115509.225aa8a2@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-04NFSv4.1: Use the nfs_client's rpc timeouts for backchannelBenjamin Coddington
For backchannel requests that lookup the appropriate nfs_client, use the state-management rpc_clnt's rpc_timeout parameters for the backchannel's response. When the nfs_client cannot be found, fall back to using the xprt's default timeout parameters. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2024-01-04rpc_pipefs: Replace one label in bl_resolve_deviceid()Markus Elfring
The kfree() function was called in one case by the bl_resolve_deviceid() function during error handling even if the passed data structure member contained a null pointer. This issue was detected by using the Coccinelle software. Thus use an other label. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2024-01-04nfs: Remove writepageMatthew Wilcox (Oracle)
NFS already has writepages and migrate_folio, so it does not need to implement writepage. The writepage operation is deprecated as it leads to worse performance under high memory pressure due to folios being written out in LRU order rather than sequentially within a file. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2024-01-04NFS: drop unused nfs_direct_req bytes_leftBenjamin Coddington
Now that we're calculating how large a remaining IO should be based on the current request's offset, we no longer need to track bytes_left on each struct nfs_direct_req. Drop the field, and clean up the direct request tracepoints. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2024-01-04pNFS: Fix the pnfs block driver's calculation of layoutget sizeTrond Myklebust
Instead of relying on the value of the 'bytes_left' field, we should calculate the layout size based on the offset of the request that is being written out. Reported-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Fixes: 954998b60caa ("NFS: Fix error handling for O_DIRECT write scheduling") Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2024-01-04nfs: print fileid in lookup tracepointsJeff Layton
With this we can see the dentry -> inode linkage that's being revalidated. A fileid of 0 means "negative dentry". Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2024-01-04nfs: rename the nfs_async_rename_done tracepointJeff Layton
We do async renames in other cases besides sillyrenames now. This tracepoint name is now misleading. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2024-01-04nfs: add new tracepoint at nfs4 revalidate entry pointJeff Layton
Add a call to the v4 d_revalidate entrypoint, just like the v3 one. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2024-01-04NFSv4.1/pnfs: Ensure we handle the error NFS4ERR_RETURNCONFLICTTrond Myklebust
Once the client has processed the CB_LAYOUTRECALL, but has not yet successfully returned the layout, the server is supposed to switch to returning NFS4ERR_RETURNCONFLICT. This patch ensures that we handle that return value correctly. Fixes: 183d9e7b112a ("pnfs: rework LAYOUTGET retry handling") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2024-01-04NFSv4.1: if referring calls are complete, trust the stateid argumentTrond Myklebust
If the server is recalling a layout, and sends us a list of referring calls that we can see are complete, then we should just trust that the stateid argument is correct, even if the sequence id doesn't match the one we hold. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2024-01-04NFSv4: Track the number of referring calls in struct cb_process_stateTrond Myklebust
When the server gives us a set of referring calls, to tell us that the NFSv4.1 callback needs to be ordered with respect to those calls, then we may want to make that information available to the operations. In certain cases, it may allow them to optimise their behaviour due to the extra knowledge. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2024-01-04NFS: Use parent's objective cred in nfs_access_login_time()Scott Mayhew
The subjective cred (task->cred) can potentially be overridden and subsquently freed in non-RCU context, which could lead to a panic if we try to use it in cred_fscmp(). Use __task_cred(), which returns the objective cred (task->real_cred) instead. Fixes: 0eb43812c027 ("NFS: Clear the file access cache upon login") Fixes: 5e9a7b9c2ea1 ("NFS: Fix up a sparse warning") Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2024-01-04NFSv4: Always ask for type with READDIRBenjamin Coddington
Again we have claimed regressions for walking a directory tree, this time with the "find" utility which always tries to optimize away asking for any attributes until it has a complete list of entries. This behavior makes the readdir plus heuristic do the wrong thing, which causes a storm of GETATTRs to determine each entry's type in order to continue the walk. For v4 add the type attribute to each READDIR request to include it no matter the heuristic. This allows a simple `find` command to proceed quickly through a directory tree. Suggested-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2024-01-04pnfs/blocklayout: Don't add zero-length pnfs_block_devBenjamin Coddington
We noticed a SCSI device that refused to allow READ CAPACITY when the device had a PR with exclusive access, registrants only. The result of this situation is that the blocklayout driver adds a pnfs_block_dev of zero length which always fails the offset_in_map tests. Instead of continuously trying to do pNFS for this case, just mark the device as unavailable which will allow the client to fallback to the MDS for the duration of PNFS_DEVICE_RETRY_TIMEOUT. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2024-01-04blocklayoutdriver: Fix reference leak of pnfs_device_nodeBenjamin Coddington
The error path for blocklayout's device lookup is missing a reference drop for the case where a lookup finds the device, but the device is marked with NFS_DEVICEID_UNAVAILABLE. Fixes: b3dce6a2f060 ("pnfs/blocklayout: handle transient devices") Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2024-01-04Merge branches 'acpi-pm', 'acpi-video', 'acpi-apei' and 'acpi-extlog'Rafael J. Wysocki
Merge an ACPI power management change, ACPI backlight driver changes, APEI updates and ACPI extlog driver changes for 6.8-rc1: - Modify the ACPI LPIT table handling code to avoid u32 multiplication overflows in state residency computations (Nikita Kiryushin). - Drop an unused helper function from the ACPI backlight (video) driver and add a clarifying comment to it (Hans de Goede). - Update the ACPI backlight driver to avoid using uninitialized memory in some cases (Nikita Kiryushin). - Add ACPI backlight quirk for the Colorful X15 AT 23 laptop (Yuluo Qiu). - Add support for vendor-defined error types to the ACPI APEI error injection code (Avadhut Naik). - Adjust APEI to properly set MF_ACTION_REQUIRED on synchronous memory failure events, so they are handled differently from the asynchronous ones (Shuai Xue). - Fix NULL pointer dereference check in the ACPI extlog driver (Prarit Bhargava). - Adjust the ACPI extlog driver to clear the Extended Error Log status when RAS_CEC handled the error (Tony Luck). * acpi-pm: ACPI: LPIT: Avoid u32 multiplication overflow * acpi-video: ACPI: video: Add quirk for the Colorful X15 AT 23 Laptop ACPI: video: check for error while searching for backlight device parent ACPI: video: Drop should_check_lcd_flag() ACPI: video: Add comment about acpi_video_backlight_use_native() usage * acpi-apei: ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on synchronous events ACPI: APEI: EINJ: Add support for vendor defined error types platform/chrome: cros_ec_debugfs: Fix permissions for panicinfo fs: debugfs: Add write functionality to debugfs blobs ACPI: APEI: EINJ: Refactor available_error_type_show() * acpi-extlog: ACPI: extlog: Clear Extended Error Log status when RAS_CEC handled the error ACPI: extlog: fix NULL pointer dereference check
2024-01-03Merge tag 'trace-v6.7-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Fix a NULL kernel dereference in set_gid() on tracefs mounting. When tracefs is mounted with "gid=1000", it will update the existing dentries to have the new gid. The tracefs_inode which is retrieved by a container_of(dentry->d_inode) has flags to see if the inode belongs to the eventfs system. The issue that was fixed was if getdents() was called on tracefs that was previously mounted, and was not closed. It will leave a "cursor dentry" in the subdirs list of the current dentries that set_gid() walks. On a remount of tracefs, the container_of(dentry->d_inode) will dereference a NULL pointer and cause a crash when referenced. Simply have a check for dentry->d_inode to see if it is NULL and if so, skip that entry. - Fix the bits of the eventfs_inode structure. The "is_events" bit was taken from the nr_entries field, but the nr_entries field wasn't updated to be 30 bits and was still 31. Including the "is_freed" bit this would use 33 bits which would make the structure use another integer for just one bit. * tag 'trace-v6.7-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: eventfs: Fix bitwise fields for "is_events" tracefs: Check for dentry->d_inode exists in set_gid()
2024-01-02eventfs: Fix bitwise fields for "is_events"Steven Rostedt (Google)
A flag was needed to denote which eventfs_inode was the "events" directory, so a bit was taken from the "nr_entries" field, as there's not that many entries, and 2^30 is plenty. But the bit number for nr_entries was not updated to reflect the bit taken from it, which would add an unnecessary integer to the structure. Link: https://lore.kernel.org/linux-trace-kernel/20240102151832.7ca87275@gandalf.local.home Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Fixes: 7e8358edf503e ("eventfs: Fix file and directory uid and gid ownership") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-01-02tracefs: Check for dentry->d_inode exists in set_gid()Steven Rostedt (Google)
If a getdents() is called on the tracefs directory but does not get all the files, it can leave a "cursor" dentry in the d_subdirs list of tracefs dentry. This cursor dentry does not have a d_inode for it. Before referencing tracefs_inode from the dentry, the d_inode must first be checked if it has content. If not, then it's not a tracefs_inode and can be ignored. The following caused a crash: #define getdents64(fd, dirp, count) syscall(SYS_getdents64, fd, dirp, count) #define BUF_SIZE 256 #define TDIR "/tmp/file0" int main(void) { char buf[BUF_SIZE]; int fd; int n; mkdir(TDIR, 0777); mount(NULL, TDIR, "tracefs", 0, NULL); fd = openat(AT_FDCWD, TDIR, O_RDONLY); n = getdents64(fd, buf, BUF_SIZE); ret = mount(NULL, TDIR, NULL, MS_NOSUID|MS_REMOUNT|MS_RELATIME|MS_LAZYTIME, "gid=1000"); return 0; } That's because the 256 BUF_SIZE was not big enough to read all the dentries of the tracefs file system and it left a "cursor" dentry in the subdirs of the tracefs root inode. Then on remounting with "gid=1000", it would cause an iteration of all dentries which hit: ti = get_tracefs(dentry->d_inode); if (ti && (ti->flags & TRACEFS_EVENT_INODE)) eventfs_update_gid(dentry, gid); Which crashed because of the dereference of the cursor dentry which had a NULL d_inode. In the subdir loop of the dentry lookup of set_gid(), if a child has a NULL d_inode, simply skip it. Link: https://lore.kernel.org/all/20240102135637.3a21fb10@gandalf.local.home/ Link: https://lore.kernel.org/linux-trace-kernel/20240102151249.05da244d@gandalf.local.home Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Fixes: 7e8358edf503e ("eventfs: Fix file and directory uid and gid ownership") Reported-by: "Ubisectech Sirius" <bugreport@ubisectech.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-01-02jfs: Add missing set_freezable() for freezable kthreadKevin Hao
The kernel thread function jfs_lazycommit() and jfs_sync() invoke the try_to_freeze() in its loop. But all the kernel threads are no-freezable by default. So if we want to make a kernel thread to be freezable, we have to invoke set_freezable() explicitly. Signed-off-by: Kevin Hao <haokexin@gmail.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2024-01-02jfs: fix array-index-out-of-bounds in diNewExtEdward Adam Davis
[Syz report] UBSAN: array-index-out-of-bounds in fs/jfs/jfs_imap.c:2360:2 index -878706688 is out of range for type 'struct iagctl[128]' CPU: 1 PID: 5065 Comm: syz-executor282 Not tainted 6.7.0-rc4-syzkaller-00009-gbee0e7762ad2 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/10/2023 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106 ubsan_epilogue lib/ubsan.c:217 [inline] __ubsan_handle_out_of_bounds+0x11c/0x150 lib/ubsan.c:348 diNewExt+0x3cf3/0x4000 fs/jfs/jfs_imap.c:2360 diAllocExt fs/jfs/jfs_imap.c:1949 [inline] diAllocAG+0xbe8/0x1e50 fs/jfs/jfs_imap.c:1666 diAlloc+0x1d3/0x1760 fs/jfs/jfs_imap.c:1587 ialloc+0x8f/0x900 fs/jfs/jfs_inode.c:56 jfs_mkdir+0x1c5/0xb90 fs/jfs/namei.c:225 vfs_mkdir+0x2f1/0x4b0 fs/namei.c:4106 do_mkdirat+0x264/0x3a0 fs/namei.c:4129 __do_sys_mkdir fs/namei.c:4149 [inline] __se_sys_mkdir fs/namei.c:4147 [inline] __x64_sys_mkdir+0x6e/0x80 fs/namei.c:4147 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x45/0x110 arch/x86/entry/common.c:82 entry_SYSCALL_64_after_hwframe+0x63/0x6b RIP: 0033:0x7fcb7e6a0b57 Code: ff ff 77 07 31 c0 c3 0f 1f 40 00 48 c7 c2 b8 ff ff ff f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 b8 53 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffd83023038 EFLAGS: 00000286 ORIG_RAX: 0000000000000053 RAX: ffffffffffffffda RBX: 00000000ffffffff RCX: 00007fcb7e6a0b57 RDX: 00000000000a1020 RSI: 00000000000001ff RDI: 0000000020000140 RBP: 0000000020000140 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000286 R12: 00007ffd830230d0 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [Analysis] When the agstart is too large, it can cause agno overflow. [Fix] After obtaining agno, if the value is invalid, exit the subsequent process. Reported-and-tested-by: syzbot+553d90297e6d2f50dbc7@syzkaller.appspotmail.com Signed-off-by: Edward Adam Davis <eadavis@qq.com> Modified the test from agno > MAXAG to agno >= MAXAG based on linux-next report by kernel test robot (Dan Carpenter). Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2024-01-01bcachefs: check_directory_structure() can now be run onlineKent Overstreet
Now that we have dynamically resizable btree paths, check_directory_structure() can check one path - inode up to the root - in a single transaction. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Fix reattach_inode() for snapshotsKent Overstreet
reattach_inode() was broken w.r.t. snapshots - we'd lookup the subvolume to look up lost+found, but if we're in an interior node snapshot that didn't make any sense. Instead, this adds a dirent path for creating in a specific snapshot, skipping the subvolume; and we also make sure to create lost+found in the root snapshot, to avoid conflicts with lost+found being created in overlapping snapshots. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: bch2_btree_trans_peek_slot_updatesKent Overstreet
refactoring the BTREE_ITER_WITH_UPDATES code, prep for removing the flag and making it always-on Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: bch2_btree_trans_peek_prev_updatesKent Overstreet
bch2_btree_iter_peek_prev() now supports BTREE_ITER_WITH_UPDATES Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: bch2_btree_trans_peek_updatesKent Overstreet
refactoring the BTREE_ITER_WITH_UPDATES code, prep for removing the flag and making it always-on Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: growable btree_pathsKent Overstreet
XXX: we're allocating memory with btree locks held - bad We need to plumb through an error path so we can do allocate_dropping_locks() - but we're merging this now because it fixes a transaction path overflow caused by indirect extent fragmentation, and the resize path is rare. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Fix interior update path btree_path usesKent Overstreet
Since the btree_paths array is now about to become growable, we have to be careful not to refer to paths by pointer across contexts where they may be reallocated. This fixes the remaining btree_interior_update() paths - split and merge. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: trans->nr_pathsKent Overstreet
Start to plumb through dynamically growable btree_paths; this patch replaces most BTREE_ITER_MAX references with trans->nr_paths. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: trans->updates will also be resizableKent Overstreet
the reflink triggers are also bumping up against the maximum number of paths in a transaction - and generating proportional numbers of updates. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: optimize __bch2_trans_get(), kill DEBUG_TRANSACTIONSKent Overstreet
- Some tweaks to greatly reduce locking overhead for the list of btree transactions, so that it can always be enabled: leave btree_trans objects on the list when they're on the percpu single item freelist, and only check for duplicates in the same process when CONFIG_BCACHEFS_DEBUG is enabled - don't zero out the full btree_trans() unless we allocated it from the mempool Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: rcu protect trans->pathsKent Overstreet
Upcoming patches are going to be changing trans->paths to a reallocatable buffer. We need to guard against use after free when it's used by other threads; this introduces RCU protection to those paths and changes them to check for trans->paths == NULL Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Clean up btree_transKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: kill btree_path.idxKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: get_unlocked_mut_path() -> btree_path_idx_tKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: bch2_btree_iter_peek_prev() no longer uses path->idxKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: bch2_path_get() no longer uses path->idxKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: trans_for_each_path_with_node() no longer uses path->idxKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: trans_for_each_path() no longer uses path->idxKent Overstreet
path->idx is now a code smell: we should be using path_idx_t, since it's stable across btree path reallocation. This is also a bit faster, using the same loop counter vs. fetching path->idx from each path we iterate over. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: kill trans_for_each_path_from()Kent Overstreet
dead code Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: bch2_btree_path_to_text() -> btree_path_idx_tKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: struct trans_for_each_path_inorder_iterKent Overstreet
reducing our usage of path->idx Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: btree_insert_entry -> btree_path_idx_tKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: btree_iter -> btree_path_idx_tKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>