summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2024-09-11Merge v6.11-rc7 into drm-nextSimona Vetter
Thomas needs 5a498d4d06d6 ("drm/fbdev-dma: Only install deferred I/O if necessary") in drm-misc, so start the backmerge cascade. Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch>
2024-09-06Merge tag 'v6.11-rc6-cifs-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull smb client fixes from Steve French: - fix potential mount hang - fix retry problem in two types of compound operations - important netfs integration fix in SMB1 read paths - fix potential uninitialized zero point of inode - minor patch to improve debugging for potential crediting problems * tag 'v6.11-rc6-cifs-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: netfs, cifs: Improve some debugging bits cifs: Fix SMB1 readv/writev callback in the same way as SMB2/3 cifs: Fix zero_point init on inode initialisation smb: client: fix double put of @cfile in smb2_set_path_size() smb: client: fix double put of @cfile in smb2_rename_path() smb: client: fix hang in wait_for_response() for negproto
2024-09-06libfs: fix get_stashed_dentry()Christian Brauner
get_stashed_dentry() tries to optimistically retrieve a stashed dentry from a provided location. It needs to ensure to hold rcu lock before it dereference the stashed location to prevent UAF issues. Use rcu_dereference() instead of READ_ONCE() it's effectively equivalent with some lockdep bells and whistles and it communicates clearly that this expects rcu protection. Link: https://lore.kernel.org/r/20240906-vfs-hotfix-5959800ffa68@brauner Fixes: 07fd7c329839 ("libfs: add path_from_stashed()") Reported-by: syzbot+f82b36bffae7ef78b6a7@syzkaller.appspotmail.com Fixes: syzbot+f82b36bffae7ef78b6a7@syzkaller.appspotmail.com Reported-by: syzbot+cbe4b96e1194b0e34db6@syzkaller.appspotmail.com Fixes: syzbot+cbe4b96e1194b0e34db6@syzkaller.appspotmail.com Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-09-05Merge tag 'trace-v6.11-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Fix adding a new fgraph callback after function graph tracing has already started. If the new caller does not initialize its hash before registering the fgraph_ops, it can cause a NULL pointer dereference. Fix this by adding a new parameter to ftrace_graph_enable_direct() passing in the newly added gops directly and not rely on using the fgraph_array[], as entries in the fgraph_array[] must be initialized. Assign the new gops to the fgraph_array[] after it goes through ftrace_startup_subops() as that will properly initialize the gops->ops and initialize its hashes. - Fix a memory leak in fgraph storage memory test. If the "multiple fgraph storage on a function" boot up selftest fails in the registering of the function graph tracer, it will not free the memory it allocated for the filter. Break the loop up into two where it allocates the filters first and then registers the functions where any errors will do the appropriate clean ups. - Only clear the timerlat timers if it has an associated kthread. In the rtla tool that uses timerlat, if it was killed just as it was shutting down, the signals can free the kthread and the timer. But the closing of the timerlat files could cause the hrtimer_cancel() to be called on the already freed timer. As the kthread variable is is set to NULL when the kthreads are stopped and the timers are freed it can be used to know not to call hrtimer_cancel() on the timer if the kthread variable is NULL. - Use a cpumask to keep track of osnoise/timerlat kthreads The timerlat tracer can use user space threads for its analysis. With the killing of the rtla tool, the kernel can get confused between if it is using a user space thread to analyze or one of its own kernel threads. When this confusion happens, kthread_stop() can be called on a user space thread and bad things happen. As the kernel threads are per-cpu, a bitmask can be used to know when a kernel thread is used or when a user space thread is used. - Add missing interface_lock to osnoise/timerlat stop_kthread() The stop_kthread() function in osnoise/timerlat clears the osnoise kthread variable, and if it was a user space thread does a put_task on it. But this can race with the closing of the timerlat files that also does a put_task on the kthread, and if the race happens the task will have put_task called on it twice and oops. - Add cond_resched() to the tracing_iter_reset() loop. The latency tracers keep writing to the ring buffer without resetting when it issues a new "start" event (like interrupts being disabled). When reading the buffer with an iterator, the tracing_iter_reset() sets its pointer to that start event by walking through all the events in the buffer until it gets to the time stamp of the start event. In the case of a very large buffer, the loop that looks for the start event has been reported taking a very long time with a non preempt kernel that it can trigger a soft lock up warning. Add a cond_resched() into that loop to make sure that doesn't happen. - Use list_del_rcu() for eventfs ei->list variable It was reported that running loops of creating and deleting kprobe events could cause a crash due to the eventfs list iteration hitting a LIST_POISON variable. This is because the list is protected by SRCU but when an item is deleted from the list, it was using list_del() which poisons the "next" pointer. This is what list_del_rcu() was to prevent. * tag 'trace-v6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing/timerlat: Add interface_lock around clearing of kthread in stop_kthread() tracing/timerlat: Only clear timer if a kthread exists tracing/osnoise: Use a cpumask to know what threads are kthreads eventfs: Use list_del_rcu() for SRCU protected list variable tracing: Avoid possible softlockup in tracing_iter_reset() tracing: Fix memory leak in fgraph storage selftest tracing: fgraph: Fix to add new fgraph_ops to array after ftrace_startup_subops()
2024-09-05eventfs: Use list_del_rcu() for SRCU protected list variableSteven Rostedt
Chi Zhiling reported: We found a null pointer accessing in tracefs[1], the reason is that the variable 'ei_child' is set to LIST_POISON1, that means the list was removed in eventfs_remove_rec. so when access the ei_child->is_freed, the panic triggered. by the way, the following script can reproduce this panic loop1 (){ while true do echo "p:kp submit_bio" > /sys/kernel/debug/tracing/kprobe_events echo "" > /sys/kernel/debug/tracing/kprobe_events done } loop2 (){ while true do tree /sys/kernel/debug/tracing/events/kprobes/ done } loop1 & loop2 [1]: [ 1147.959632][T17331] Unable to handle kernel paging request at virtual address dead000000000150 [ 1147.968239][T17331] Mem abort info: [ 1147.971739][T17331] ESR = 0x0000000096000004 [ 1147.976172][T17331] EC = 0x25: DABT (current EL), IL = 32 bits [ 1147.982171][T17331] SET = 0, FnV = 0 [ 1147.985906][T17331] EA = 0, S1PTW = 0 [ 1147.989734][T17331] FSC = 0x04: level 0 translation fault [ 1147.995292][T17331] Data abort info: [ 1147.998858][T17331] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 [ 1148.005023][T17331] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ 1148.010759][T17331] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 1148.016752][T17331] [dead000000000150] address between user and kernel address ranges [ 1148.024571][T17331] Internal error: Oops: 0000000096000004 [#1] SMP [ 1148.030825][T17331] Modules linked in: team_mode_loadbalance team nlmon act_gact cls_flower sch_ingress bonding tls macvlan dummy ib_core bridge stp llc veth amdgpu amdxcp mfd_core gpu_sched drm_exec drm_buddy radeon crct10dif_ce video drm_suballoc_helper ghash_ce drm_ttm_helper sha2_ce ttm sha256_arm64 i2c_algo_bit sha1_ce sbsa_gwdt cp210x drm_display_helper cec sr_mod cdrom drm_kms_helper binfmt_misc sg loop fuse drm dm_mod nfnetlink ip_tables autofs4 [last unloaded: tls] [ 1148.072808][T17331] CPU: 3 PID: 17331 Comm: ls Tainted: G W ------- ---- 6.6.43 #2 [ 1148.081751][T17331] Source Version: 21b3b386e948bedd29369af66f3e98ab01b1c650 [ 1148.088783][T17331] Hardware name: Greatwall GW-001M1A-FTF/GW-001M1A-FTF, BIOS KunLun BIOS V4.0 07/16/2020 [ 1148.098419][T17331] pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 1148.106060][T17331] pc : eventfs_iterate+0x2c0/0x398 [ 1148.111017][T17331] lr : eventfs_iterate+0x2fc/0x398 [ 1148.115969][T17331] sp : ffff80008d56bbd0 [ 1148.119964][T17331] x29: ffff80008d56bbf0 x28: ffff001ff5be2600 x27: 0000000000000000 [ 1148.127781][T17331] x26: ffff001ff52ca4e0 x25: 0000000000009977 x24: dead000000000100 [ 1148.135598][T17331] x23: 0000000000000000 x22: 000000000000000b x21: ffff800082645f10 [ 1148.143415][T17331] x20: ffff001fddf87c70 x19: ffff80008d56bc90 x18: 0000000000000000 [ 1148.151231][T17331] x17: 0000000000000000 x16: 0000000000000000 x15: ffff001ff52ca4e0 [ 1148.159048][T17331] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 [ 1148.166864][T17331] x11: 0000000000000000 x10: 0000000000000000 x9 : ffff8000804391d0 [ 1148.174680][T17331] x8 : 0000000180000000 x7 : 0000000000000018 x6 : 0000aaab04b92862 [ 1148.182498][T17331] x5 : 0000aaab04b92862 x4 : 0000000080000000 x3 : 0000000000000068 [ 1148.190314][T17331] x2 : 000000000000000f x1 : 0000000000007ea8 x0 : 0000000000000001 [ 1148.198131][T17331] Call trace: [ 1148.201259][T17331] eventfs_iterate+0x2c0/0x398 [ 1148.205864][T17331] iterate_dir+0x98/0x188 [ 1148.210036][T17331] __arm64_sys_getdents64+0x78/0x160 [ 1148.215161][T17331] invoke_syscall+0x78/0x108 [ 1148.219593][T17331] el0_svc_common.constprop.0+0x48/0xf0 [ 1148.224977][T17331] do_el0_svc+0x24/0x38 [ 1148.228974][T17331] el0_svc+0x40/0x168 [ 1148.232798][T17331] el0t_64_sync_handler+0x120/0x130 [ 1148.237836][T17331] el0t_64_sync+0x1a4/0x1a8 [ 1148.242182][T17331] Code: 54ffff6c f9400676 910006d6 f9000676 (b9405300) [ 1148.248955][T17331] ---[ end trace 0000000000000000 ]--- The issue is that list_del() is used on an SRCU protected list variable before the synchronization occurs. This can poison the list pointers while there is a reader iterating the list. This is simply fixed by using list_del_rcu() that is specifically made for this purpose. Link: https://lore.kernel.org/linux-trace-kernel/20240829085025.3600021-1-chizhiling@163.com/ Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/20240904131605.640d42b1@gandalf.local.home Fixes: 43aa6f97c2d03 ("eventfs: Get rid of dentry pointers without refcounts") Reported-by: Chi Zhiling <chizhiling@kylinos.cn> Tested-by: Chi Zhiling <chizhiling@kylinos.cn> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-09-04Merge tag 'bcachefs-2024-09-04' of git://evilpiepirate.org/bcachefsLinus Torvalds
Pull bcachefs fixes from Kent Overstreet: - Fix a typo in the rebalance accounting changes - BCH_SB_MEMBER_INVALID: small on disk format feature which will be needed for full erasure coding support; this is only the minimum so that 6.11 can handle future versions without barfing. * tag 'bcachefs-2024-09-04' of git://evilpiepirate.org/bcachefs: bcachefs: BCH_SB_MEMBER_INVALID bcachefs: fix rebalance accounting
2024-09-04Merge tag 'for-6.11-rc6-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - followup fix for direct io and fsync under some conditions, reported by QEMU users - fix a potential leak when disabling quotas while some extent tracking work can still happen - in zoned mode handle unexpected change of zone write pointer in RAID1-like block groups, turn the zones to read-only * tag 'for-6.11-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix race between direct IO write and fsync when using same fd btrfs: zoned: handle broken write pointer on zones btrfs: qgroup: don't use extent changeset when not needed
2024-09-04Merge tag 'v6.11-rc6-server-fixes' of git://git.samba.org/ksmbdLinus Torvalds
Pull smb server fixes from Steve French: - Fix crash in session setup - Fix locking bug - Improve access bounds checking * tag 'v6.11-rc6-server-fixes' of git://git.samba.org/ksmbd: ksmbd: Unlock on in ksmbd_tcp_set_interfaces() ksmbd: unset the binding mark of a reused connection smb: Annotate struct xattr_smb_acl with __counted_by()
2024-09-04Merge tag 'vfs-6.11-rc7.fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: "Two netfs fixes for this merge window: - Ensure that fscache_cookie_lru_time is deleted when the fscache module is removed to prevent UAF - Fix filemap_invalidate_inode() to use invalidate_inode_pages2_range() Before it used truncate_inode_pages_partial() which causes copy_file_range() to fail on cifs" * tag 'vfs-6.11-rc7.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fscache: delete fscache_cookie_lru_timer when fscache exits to avoid UAF mm: Fix filemap_invalidate_inode() to use invalidate_inode_pages2_range()
2024-09-04Merge tag 'mm-hotfixes-stable-2024-09-03-20-19' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "17 hotfixes, 15 of which are cc:stable. Mostly MM, no identifiable theme. And a few nilfs2 fixups" * tag 'mm-hotfixes-stable-2024-09-03-20-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: alloc_tag: fix allocation tag reporting when CONFIG_MODULES=n mm: vmalloc: optimize vmap_lazy_nr arithmetic when purging each vmap_area mailmap: update entry for Jan Kuliga codetag: debug: mark codetags for poisoned page as empty mm/memcontrol: respect zswap.writeback setting from parent cg too scripts: fix gfp-translate after ___GFP_*_BITS conversion to an enum Revert "mm: skip CMA pages when they are not available" maple_tree: remove rcu_read_lock() from mt_validate() kexec_file: fix elfcorehdr digest exclusion when CONFIG_CRASH_HOTPLUG=y mm/slub: add check for s->flags in the alloc_tagging_slab_free_hook nilfs2: fix state management in error path of log writing function nilfs2: fix missing cleanup on rollforward recovery error nilfs2: protect references to superblock parameters exposed in sysfs userfaultfd: don't BUG_ON() if khugepaged yanks our page table userfaultfd: fix checks for huge PMDs mm: vmalloc: ensure vmap_block is initialised before adding to queue selftests: mm: fix build errors on armhf
2024-09-03bcachefs: BCH_SB_MEMBER_INVALIDKent Overstreet
Create a sentinal value for "invalid device". This is needed for removing devices that have stripes on them (force removing, without evacuating); we need a sentinal value for the stripe pointers to the device being removed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-03Merge tag 'fuse-fixes-6.11-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse fixes from Miklos Szeredi: - Fix EIO if splice and page stealing are enabled on the fuse device - Disable problematic combination of passthrough and writeback-cache - Other bug fixes found by code review * tag 'fuse-fixes-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: disable the combination of passthrough and writeback cache fuse: update stats for pages in dropped aux writeback list fuse: clear PG_uptodate when using a stolen page fuse: fix memory leak in fuse_create_open fuse: check aborted connection before adding requests to pending list for resending fuse: use unsigned type for getxattr/listxattr size truncation
2024-09-03btrfs: fix race between direct IO write and fsync when using same fdFilipe Manana
If we have 2 threads that are using the same file descriptor and one of them is doing direct IO writes while the other is doing fsync, we have a race where we can end up either: 1) Attempt a fsync without holding the inode's lock, triggering an assertion failures when assertions are enabled; 2) Do an invalid memory access from the fsync task because the file private points to memory allocated on stack by the direct IO task and it may be used by the fsync task after the stack was destroyed. The race happens like this: 1) A user space program opens a file descriptor with O_DIRECT; 2) The program spawns 2 threads using libpthread for example; 3) One of the threads uses the file descriptor to do direct IO writes, while the other calls fsync using the same file descriptor. 4) Call task A the thread doing direct IO writes and task B the thread doing fsyncs; 5) Task A does a direct IO write, and at btrfs_direct_write() sets the file's private to an on stack allocated private with the member 'fsync_skip_inode_lock' set to true; 6) Task B enters btrfs_sync_file() and sees that there's a private structure associated to the file which has 'fsync_skip_inode_lock' set to true, so it skips locking the inode's VFS lock; 7) Task A completes the direct IO write, and resets the file's private to NULL since it had no prior private and our private was stack allocated. Then it unlocks the inode's VFS lock; 8) Task B enters btrfs_get_ordered_extents_for_logging(), then the assertion that checks the inode's VFS lock is held fails, since task B never locked it and task A has already unlocked it. The stack trace produced is the following: assertion failed: inode_is_locked(&inode->vfs_inode), in fs/btrfs/ordered-data.c:983 ------------[ cut here ]------------ kernel BUG at fs/btrfs/ordered-data.c:983! Oops: invalid opcode: 0000 [#1] PREEMPT SMP PTI CPU: 9 PID: 5072 Comm: worker Tainted: G U OE 6.10.5-1-default #1 openSUSE Tumbleweed 69f48d427608e1c09e60ea24c6c55e2ca1b049e8 Hardware name: Acer Predator PH315-52/Covini_CFS, BIOS V1.12 07/28/2020 RIP: 0010:btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs] Code: 50 d6 86 c0 e8 (...) RSP: 0018:ffff9e4a03dcfc78 EFLAGS: 00010246 RAX: 0000000000000054 RBX: ffff9078a9868e98 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff907dce4a7800 RDI: ffff907dce4a7800 RBP: ffff907805518800 R08: 0000000000000000 R09: ffff9e4a03dcfb38 R10: ffff9e4a03dcfb30 R11: 0000000000000003 R12: ffff907684ae7800 R13: 0000000000000001 R14: ffff90774646b600 R15: 0000000000000000 FS: 00007f04b96006c0(0000) GS:ffff907dce480000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f32acbfc000 CR3: 00000001fd4fa005 CR4: 00000000003726f0 Call Trace: <TASK> ? __die_body.cold+0x14/0x24 ? die+0x2e/0x50 ? do_trap+0xca/0x110 ? do_error_trap+0x6a/0x90 ? btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a] ? exc_invalid_op+0x50/0x70 ? btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a] ? asm_exc_invalid_op+0x1a/0x20 ? btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a] ? btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a] btrfs_sync_file+0x21a/0x4d0 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a] ? __seccomp_filter+0x31d/0x4f0 __x64_sys_fdatasync+0x4f/0x90 do_syscall_64+0x82/0x160 ? do_futex+0xcb/0x190 ? __x64_sys_futex+0x10e/0x1d0 ? switch_fpu_return+0x4f/0xd0 ? syscall_exit_to_user_mode+0x72/0x220 ? do_syscall_64+0x8e/0x160 ? syscall_exit_to_user_mode+0x72/0x220 ? do_syscall_64+0x8e/0x160 ? syscall_exit_to_user_mode+0x72/0x220 ? do_syscall_64+0x8e/0x160 ? syscall_exit_to_user_mode+0x72/0x220 ? do_syscall_64+0x8e/0x160 entry_SYSCALL_64_after_hwframe+0x76/0x7e Another problem here is if task B grabs the private pointer and then uses it after task A has finished, since the private was allocated in the stack of task A, it results in some invalid memory access with a hard to predict result. This issue, triggering the assertion, was observed with QEMU workloads by two users in the Link tags below. Fix this by not relying on a file's private to pass information to fsync that it should skip locking the inode and instead pass this information through a special value stored in current->journal_info. This is safe because in the relevant section of the direct IO write path we are not holding a transaction handle, so current->journal_info is NULL. The following C program triggers the issue: $ cat repro.c /* Get the O_DIRECT definition. */ #ifndef _GNU_SOURCE #define _GNU_SOURCE #endif #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <stdint.h> #include <fcntl.h> #include <errno.h> #include <string.h> #include <pthread.h> static int fd; static ssize_t do_write(int fd, const void *buf, size_t count, off_t offset) { while (count > 0) { ssize_t ret; ret = pwrite(fd, buf, count, offset); if (ret < 0) { if (errno == EINTR) continue; return ret; } count -= ret; buf += ret; } return 0; } static void *fsync_loop(void *arg) { while (1) { int ret; ret = fsync(fd); if (ret != 0) { perror("Fsync failed"); exit(6); } } } int main(int argc, char *argv[]) { long pagesize; void *write_buf; pthread_t fsyncer; int ret; if (argc != 2) { fprintf(stderr, "Use: %s <file path>\n", argv[0]); return 1; } fd = open(argv[1], O_WRONLY | O_CREAT | O_TRUNC | O_DIRECT, 0666); if (fd == -1) { perror("Failed to open/create file"); return 1; } pagesize = sysconf(_SC_PAGE_SIZE); if (pagesize == -1) { perror("Failed to get page size"); return 2; } ret = posix_memalign(&write_buf, pagesize, pagesize); if (ret) { perror("Failed to allocate buffer"); return 3; } ret = pthread_create(&fsyncer, NULL, fsync_loop, NULL); if (ret != 0) { fprintf(stderr, "Failed to create writer thread: %d\n", ret); return 4; } while (1) { ret = do_write(fd, write_buf, pagesize, 0); if (ret != 0) { perror("Write failed"); exit(5); } } return 0; } $ mkfs.btrfs -f /dev/sdi $ mount /dev/sdi /mnt/sdi $ timeout 10 ./repro /mnt/sdi/foo Usually the race is triggered within less than 1 second. A test case for fstests will follow soon. Reported-by: Paulo Dias <paulo.miguel.dias@gmail.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=219187 Reported-by: Andreas Jahn <jahn-andi@web.de> Link: https://bugzilla.kernel.org/show_bug.cgi?id=219199 Reported-by: syzbot+4704b3cc972bd76024f1@syzkaller.appspotmail.com Link: https://lore.kernel.org/linux-btrfs/00000000000044ff540620d7dee2@google.com/ Fixes: 939b656bc8ab ("btrfs: fix corruption after buffer fault in during direct IO append write") CC: stable@vger.kernel.org # 5.15+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-03netfs, cifs: Improve some debugging bitsDavid Howells
Improve some debugging bits: (1) The netfslib _debug() macro doesn't need a newline in its format string. (2) Display the request debug ID and subrequest index in messages emitted in smb2_adjust_credits() to make it easier to reference in traces. Signed-off-by: David Howells <dhowells@redhat.com> cc: Steve French <sfrench@samba.org> cc: Paulo Alcantara <pc@manguebit.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2024-09-03cifs: Fix SMB1 readv/writev callback in the same way as SMB2/3David Howells
Port a number of SMB2/3 async readv/writev fixes to the SMB1 transport: commit a88d60903696c01de577558080ec4fc738a70475 cifs: Don't advance the I/O iterator before terminating subrequest commit ce5291e56081730ec7d87bc9aa41f3de73ff3256 cifs: Defer read completion commit 1da29f2c39b67b846b74205c81bf0ccd96d34727 netfs, cifs: Fix handling of short DIO read Fixes: 3ee1a1fc3981 ("cifs: Cut over to using netfslib") Signed-off-by: David Howells <dhowells@redhat.com> Reported-by: Steve French <stfrench@microsoft.com> Reviewed-by: Paulo Alcantara <pc@manguebit.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2024-09-03cifs: Fix zero_point init on inode initialisationDavid Howells
Fix cifs_fattr_to_inode() such that the ->zero_point tracking variable is initialised when the inode is initialised. Fixes: 3ee1a1fc3981 ("cifs: Cut over to using netfslib") Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org Signed-off-by: Steve French <stfrench@microsoft.com>
2024-09-03smb: client: fix double put of @cfile in smb2_set_path_size()Paulo Alcantara
If smb2_compound_op() is called with a valid @cfile and returned -EINVAL, we need to call cifs_get_writable_path() before retrying it as the reference of @cfile was already dropped by previous call. This fixes the following KASAN splat when running fstests generic/013 against Windows Server 2022: CIFS: Attempting to mount //w22-fs0/scratch run fstests generic/013 at 2024-09-02 19:48:59 ================================================================== BUG: KASAN: slab-use-after-free in detach_if_pending+0xab/0x200 Write of size 8 at addr ffff88811f1a3730 by task kworker/3:2/176 CPU: 3 UID: 0 PID: 176 Comm: kworker/3:2 Not tainted 6.11.0-rc6 #2 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014 Workqueue: cifsoplockd cifs_oplock_break [cifs] Call Trace: <TASK> dump_stack_lvl+0x5d/0x80 ? detach_if_pending+0xab/0x200 print_report+0x156/0x4d9 ? detach_if_pending+0xab/0x200 ? __virt_addr_valid+0x145/0x300 ? __phys_addr+0x46/0x90 ? detach_if_pending+0xab/0x200 kasan_report+0xda/0x110 ? detach_if_pending+0xab/0x200 detach_if_pending+0xab/0x200 timer_delete+0x96/0xe0 ? __pfx_timer_delete+0x10/0x10 ? rcu_is_watching+0x20/0x50 try_to_grab_pending+0x46/0x3b0 __cancel_work+0x89/0x1b0 ? __pfx___cancel_work+0x10/0x10 ? kasan_save_track+0x14/0x30 cifs_close_deferred_file+0x110/0x2c0 [cifs] ? __pfx_cifs_close_deferred_file+0x10/0x10 [cifs] ? __pfx_down_read+0x10/0x10 cifs_oplock_break+0x4c1/0xa50 [cifs] ? __pfx_cifs_oplock_break+0x10/0x10 [cifs] ? lock_is_held_type+0x85/0xf0 ? mark_held_locks+0x1a/0x90 process_one_work+0x4c6/0x9f0 ? find_held_lock+0x8a/0xa0 ? __pfx_process_one_work+0x10/0x10 ? lock_acquired+0x220/0x550 ? __list_add_valid_or_report+0x37/0x100 worker_thread+0x2e4/0x570 ? __kthread_parkme+0xd1/0xf0 ? __pfx_worker_thread+0x10/0x10 kthread+0x17f/0x1c0 ? kthread+0xda/0x1c0 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x31/0x60 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> Allocated by task 1118: kasan_save_stack+0x30/0x50 kasan_save_track+0x14/0x30 __kasan_kmalloc+0xaa/0xb0 cifs_new_fileinfo+0xc8/0x9d0 [cifs] cifs_atomic_open+0x467/0x770 [cifs] lookup_open.isra.0+0x665/0x8b0 path_openat+0x4c3/0x1380 do_filp_open+0x167/0x270 do_sys_openat2+0x129/0x160 __x64_sys_creat+0xad/0xe0 do_syscall_64+0xbb/0x1d0 entry_SYSCALL_64_after_hwframe+0x77/0x7f Freed by task 83: kasan_save_stack+0x30/0x50 kasan_save_track+0x14/0x30 kasan_save_free_info+0x3b/0x70 poison_slab_object+0xe9/0x160 __kasan_slab_free+0x32/0x50 kfree+0xf2/0x300 process_one_work+0x4c6/0x9f0 worker_thread+0x2e4/0x570 kthread+0x17f/0x1c0 ret_from_fork+0x31/0x60 ret_from_fork_asm+0x1a/0x30 Last potentially related work creation: kasan_save_stack+0x30/0x50 __kasan_record_aux_stack+0xad/0xc0 insert_work+0x29/0xe0 __queue_work+0x5ea/0x760 queue_work_on+0x6d/0x90 _cifsFileInfo_put+0x3f6/0x770 [cifs] smb2_compound_op+0x911/0x3940 [cifs] smb2_set_path_size+0x228/0x270 [cifs] cifs_set_file_size+0x197/0x460 [cifs] cifs_setattr+0xd9c/0x14b0 [cifs] notify_change+0x4e3/0x740 do_truncate+0xfa/0x180 vfs_truncate+0x195/0x200 __x64_sys_truncate+0x109/0x150 do_syscall_64+0xbb/0x1d0 entry_SYSCALL_64_after_hwframe+0x77/0x7f Fixes: 71f15c90e785 ("smb: client: retry compound request without reusing lease") Cc: stable@vger.kernel.org Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-09-03smb: client: fix double put of @cfile in smb2_rename_path()Paulo Alcantara
If smb2_set_path_attr() is called with a valid @cfile and returned -EINVAL, we need to call cifs_get_writable_path() again as the reference of @cfile was already dropped by previous smb2_compound_op() call. Fixes: 71f15c90e785 ("smb: client: retry compound request without reusing lease") Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-09-02smb: client: fix hang in wait_for_response() for negprotoPaulo Alcantara
Call cifs_reconnect() to wake up processes waiting on negotiate protocol to handle the case where server abruptly shut down and had no chance to properly close the socket. Simple reproducer: ssh 192.168.2.100 pkill -STOP smbd mount.cifs //192.168.2.100/test /mnt -o ... [never returns] Cc: Rickard Andersson <rickaran@axis.com> Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-09-02btrfs: zoned: handle broken write pointer on zonesNaohiro Aota
Btrfs rejects to mount a FS if it finds a block group with a broken write pointer (e.g, unequal write pointers on two zones of RAID1 block group). Since such case can happen easily with a power-loss or crash of a system, we need to handle the case more gently. Handle such block group by making it unallocatable, so that there will be no writes into it. That can be done by setting the allocation pointer at the end of allocating region (= block_group->zone_capacity). Then, existing code handle zone_unusable properly. Having proper zone_capacity is necessary for the change. So, set it as fast as possible. We cannot handle RAID0 and RAID10 case like this. But, they are anyway unable to read because of a missing stripe. Fixes: 265f7237dd25 ("btrfs: zoned: allow DUP on meta-data block groups") Fixes: 568220fa9657 ("btrfs: zoned: support RAID0/1/10 on top of raid stripe tree") CC: stable@vger.kernel.org # 6.1+ Reported-by: HAN Yuwei <hrx@bupt.moe> Cc: Xuefer <xuefer@gmail.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-02btrfs: qgroup: don't use extent changeset when not neededFedor Pchelkin
The local extent changeset is passed to clear_record_extent_bits() where it may have some additional memory dynamically allocated for ulist. When qgroup is disabled, the memory is leaked because in this case the changeset is not released upon __btrfs_qgroup_release_data() return. Since the recorded contents of the changeset are not used thereafter, just don't pass it. Found by Linux Verification Center (linuxtesting.org) with Syzkaller. Reported-by: syzbot+81670362c283f3dd889c@syzkaller.appspotmail.com Closes: https://lore.kernel.org/lkml/000000000000aa8c0c060ade165e@google.com Fixes: af0e2aab3b70 ("btrfs: qgroup: flush reservations during quota disable") CC: stable@vger.kernel.org # 6.10+ Reviewed-by: Boris Burkov <boris@bur.io> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-01nilfs2: fix state management in error path of log writing functionRyusuke Konishi
After commit a694291a6211 ("nilfs2: separate wait function from nilfs_segctor_write") was applied, the log writing function nilfs_segctor_do_construct() was able to issue I/O requests continuously even if user data blocks were split into multiple logs across segments, but two potential flaws were introduced in its error handling. First, if nilfs_segctor_begin_construction() fails while creating the second or subsequent logs, the log writing function returns without calling nilfs_segctor_abort_construction(), so the writeback flag set on pages/folios will remain uncleared. This causes page cache operations to hang waiting for the writeback flag. For example, truncate_inode_pages_final(), which is called via nilfs_evict_inode() when an inode is evicted from memory, will hang. Second, the NILFS_I_COLLECTED flag set on normal inodes remain uncleared. As a result, if the next log write involves checkpoint creation, that's fine, but if a partial log write is performed that does not, inodes with NILFS_I_COLLECTED set are erroneously removed from the "sc_dirty_files" list, and their data and b-tree blocks may not be written to the device, corrupting the block mapping. Fix these issues by uniformly calling nilfs_segctor_abort_construction() on failure of each step in the loop in nilfs_segctor_do_construct(), having it clean up logs and segment usages according to progress, and correcting the conditions for calling nilfs_redirty_inodes() to ensure that the NILFS_I_COLLECTED flag is cleared. Link: https://lkml.kernel.org/r/20240814101119.4070-1-konishi.ryusuke@gmail.com Fixes: a694291a6211 ("nilfs2: separate wait function from nilfs_segctor_write") Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01nilfs2: fix missing cleanup on rollforward recovery errorRyusuke Konishi
In an error injection test of a routine for mount-time recovery, KASAN found a use-after-free bug. It turned out that if data recovery was performed using partial logs created by dsync writes, but an error occurred before starting the log writer to create a recovered checkpoint, the inodes whose data had been recovered were left in the ns_dirty_files list of the nilfs object and were not freed. Fix this issue by cleaning up inodes that have read the recovery data if the recovery routine fails midway before the log writer starts. Link: https://lkml.kernel.org/r/20240810065242.3701-1-konishi.ryusuke@gmail.com Fixes: 0f3e1c7f23f8 ("nilfs2: recovery functions") Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01nilfs2: protect references to superblock parameters exposed in sysfsRyusuke Konishi
The superblock buffers of nilfs2 can not only be overwritten at runtime for modifications/repairs, but they are also regularly swapped, replaced during resizing, and even abandoned when degrading to one side due to backing device issues. So, accessing them requires mutual exclusion using the reader/writer semaphore "nilfs->ns_sem". Some sysfs attribute show methods read this superblock buffer without the necessary mutual exclusion, which can cause problems with pointer dereferencing and memory access, so fix it. Link: https://lkml.kernel.org/r/20240811100320.9913-1-konishi.ryusuke@gmail.com Fixes: da7141fb78db ("nilfs2: add /sys/fs/nilfs2/<device> group") Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01bcachefs: fix rebalance accountingKent Overstreet
Fixes: 49aa7830396b ("bcachefs: Fix rebalance_work accounting") Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-01fscache: delete fscache_cookie_lru_timer when fscache exits to avoid UAFBaokun Li
The fscache_cookie_lru_timer is initialized when the fscache module is inserted, but is not deleted when the fscache module is removed. If timer_reduce() is called before removing the fscache module, the fscache_cookie_lru_timer will be added to the timer list of the current cpu. Afterwards, a use-after-free will be triggered in the softIRQ after removing the fscache module, as follows: ================================================================== BUG: unable to handle page fault for address: fffffbfff803c9e9 PF: supervisor read access in kernel mode PF: error_code(0x0000) - not-present page PGD 21ffea067 P4D 21ffea067 PUD 21ffe6067 PMD 110a7c067 PTE 0 Oops: Oops: 0000 [#1] PREEMPT SMP KASAN PTI CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Tainted: G W 6.11.0-rc3 #855 Tainted: [W]=WARN RIP: 0010:__run_timer_base.part.0+0x254/0x8a0 Call Trace: <IRQ> tmigr_handle_remote_up+0x627/0x810 __walk_groups.isra.0+0x47/0x140 tmigr_handle_remote+0x1fa/0x2f0 handle_softirqs+0x180/0x590 irq_exit_rcu+0x84/0xb0 sysvec_apic_timer_interrupt+0x6e/0x90 </IRQ> <TASK> asm_sysvec_apic_timer_interrupt+0x1a/0x20 RIP: 0010:default_idle+0xf/0x20 default_idle_call+0x38/0x60 do_idle+0x2b5/0x300 cpu_startup_entry+0x54/0x60 start_secondary+0x20d/0x280 common_startup_64+0x13e/0x148 </TASK> Modules linked in: [last unloaded: netfs] ================================================================== Therefore delete fscache_cookie_lru_timer when removing the fscahe module. Fixes: 12bb21a29c19 ("fscache: Implement cookie user counting and resource pinning") Cc: stable@kernel.org Signed-off-by: Baokun Li <libaokun1@huawei.com> Link: https://lore.kernel.org/r/20240826112056.2458299-1-libaokun@huaweicloud.com Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-01Merge tag 'v6.11-rc5-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull smb client fixes from Steve French: - copy_file_range fix - two read fixes including read past end of file rc fix and read retry crediting fix - falloc zero range fix * tag 'v6.11-rc5-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: Fix FALLOC_FL_ZERO_RANGE to preflush buffered part of target region cifs: Fix copy offload to flush destination region netfs, cifs: Fix handling of short DIO read cifs: Fix lack of credit renegotiation on read retry
2024-09-01Merge tag 'bcachefs-2024-08-21' of https://github.com/koverstreet/bcachefsLinus Torvalds
Push bcachefs fixes from Kent Overstreet: "The data corruption in the buffered write path is troubling; inode lock should not have been able to cause that... - Fix a rare data corruption in the rebalance path, caught as a nonce inconsistency on encrypted filesystems - Revert lockless buffered write path - Mark more errors as autofix" * tag 'bcachefs-2024-08-21' of https://github.com/koverstreet/bcachefs: bcachefs: Mark more errors as autofix bcachefs: Revert lockless buffered IO path bcachefs: Fix bch2_extents_match() false positive bcachefs: Fix failure to return error in data_update_index_update()
2024-08-31bcachefs: Mark more errors as autofixKent Overstreet
errors that are known to always be safe to fix should be autofix: this should be most errors even at this point, but that will need some thorough review. note that errors are still logged in the superblock, so we'll still know that they happened. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-31bcachefs: Revert lockless buffered IO pathKent Overstreet
We had a report of data corruption on nixos when building installer images. https://github.com/NixOS/nixpkgs/pull/321055#issuecomment-2184131334 It seems that writes are being dropped, but only when issued by QEMU, and possibly only in snapshot mode. It's undetermined if it's write calls are being dropped or dirty folios. Further testing, via minimizing the original patch to just the change that skips the inode lock on non appends/truncates, reveals that it really is just not taking the inode lock that causes the corruption: it has nothing to do with the other logic changes for preserving write atomicity in corner cases. It's also kernel config dependent: it doesn't reproduce with the minimal kernel config that ktest uses, but it does reproduce with nixos's distro config. Bisection the kernel config initially pointer the finger at page migration or compaction, but it appears that was erroneous; we haven't yet determined what kernel config option actually triggers it. Sadly it appears this will have to be reverted since we're getting too close to release and my plate is full, but we'd _really_ like to fully debug it. My suspicion is that this patch is exposing a preexisting bug - the inode lock actually covers very little in IO paths, and we have a different lock (the pagecache add lock) that guards against races with truncate here. Fixes: 7e64c86cdc6c ("bcachefs: Buffered write path now can avoid the inode lock") Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-01Merge tag 'nfsd-6.11-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fix from Chuck Lever: - One more write delegation fix * tag 'nfsd-6.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: nfsd: fix nfsd4_deleg_getattr_conflict in presence of third party lease
2024-09-01Merge tag 'xfs-6.11-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs fixes from Chandan Babu: - Do not call out v1 inodes with non-zero di_nlink field as being corrupt - Change xfs_finobt_count_blocks() to count "free inode btree" blocks rather than "inode btree" blocks - Don't report the number of trimmed bytes via FITRIM because the underlying storage isn't required to do anything and failed discard IOs aren't reported to the caller anyway - Fix incorrect setting of rm_owner field in an rmap query - Report missing disk offset range in an fsmap query - Obtain m_growlock when extending realtime section of the filesystem - Reset rootdir extent size hint after extending realtime section of the filesystem * tag 'xfs-6.11-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: reset rootdir extent size hint after growfsrt xfs: take m_growlock when running growfsrt xfs: Fix missing interval for missing_owner in xfs fsmap xfs: use XFS_BUF_DADDR_NULL for daddrs in getfsmap code xfs: Fix the owner setting issue for rmap query in xfs fsmap xfs: don't bother reporting blocks trimmed via FITRIM xfs: xfs_finobt_count_blocks() walks the wrong btree xfs: fix folio dirtying for XFILE_ALLOC callers xfs: fix di_onlink checking for V1/V2 inodes
2024-08-30nfsd: fix nfsd4_deleg_getattr_conflict in presence of third party leaseNeilBrown
It is not safe to dereference fl->c.flc_owner without first confirming fl->fl_lmops is the expected manager. nfsd4_deleg_getattr_conflict() tests fl_lmops but largely ignores the result and assumes that flc_owner is an nfs4_delegation anyway. This is wrong. With this patch we restore the "!= &nfsd_lease_mng_ops" case to behave as it did before the change mentioned below. This is the same as the current code, but without any reference to a possible delegation. Fixes: c5967721e106 ("NFSD: handle GETATTR conflict with write delegation") Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-08-29ksmbd: Unlock on in ksmbd_tcp_set_interfaces()Dan Carpenter
Unlock before returning an error code if this allocation fails. Fixes: 0626e6641f6b ("cifsd: add server handler for central processing and tranport layers") Cc: stable@vger.kernel.org # v5.15+ Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-08-29ksmbd: unset the binding mark of a reused connectionNamjae Jeon
Steve French reported null pointer dereference error from sha256 lib. cifs.ko can send session setup requests on reused connection. If reused connection is used for binding session, conn->binding can still remain true and generate_preauth_hash() will not set sess->Preauth_HashValue and it will be NULL. It is used as a material to create an encryption key in ksmbd_gen_smb311_encryptionkey. ->Preauth_HashValue cause null pointer dereference error from crypto_shash_update(). BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 8 PID: 429254 Comm: kworker/8:39 Hardware name: LENOVO 20MAS08500/20MAS08500, BIOS N2CET69W (1.52 ) Workqueue: ksmbd-io handle_ksmbd_work [ksmbd] RIP: 0010:lib_sha256_base_do_update.isra.0+0x11e/0x1d0 [sha256_ssse3] <TASK> ? show_regs+0x6d/0x80 ? __die+0x24/0x80 ? page_fault_oops+0x99/0x1b0 ? do_user_addr_fault+0x2ee/0x6b0 ? exc_page_fault+0x83/0x1b0 ? asm_exc_page_fault+0x27/0x30 ? __pfx_sha256_transform_rorx+0x10/0x10 [sha256_ssse3] ? lib_sha256_base_do_update.isra.0+0x11e/0x1d0 [sha256_ssse3] ? __pfx_sha256_transform_rorx+0x10/0x10 [sha256_ssse3] ? __pfx_sha256_transform_rorx+0x10/0x10 [sha256_ssse3] _sha256_update+0x77/0xa0 [sha256_ssse3] sha256_avx2_update+0x15/0x30 [sha256_ssse3] crypto_shash_update+0x1e/0x40 hmac_update+0x12/0x20 crypto_shash_update+0x1e/0x40 generate_key+0x234/0x380 [ksmbd] generate_smb3encryptionkey+0x40/0x1c0 [ksmbd] ksmbd_gen_smb311_encryptionkey+0x72/0xa0 [ksmbd] ntlm_authenticate.isra.0+0x423/0x5d0 [ksmbd] smb2_sess_setup+0x952/0xaa0 [ksmbd] __process_request+0xa3/0x1d0 [ksmbd] __handle_ksmbd_work+0x1c4/0x2f0 [ksmbd] handle_ksmbd_work+0x2d/0xa0 [ksmbd] process_one_work+0x16c/0x350 worker_thread+0x306/0x440 ? __pfx_worker_thread+0x10/0x10 kthread+0xef/0x120 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x44/0x70 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK> Fixes: f5a544e3bab7 ("ksmbd: add support for SMB3 multichannel") Cc: stable@vger.kernel.org # v5.15+ Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-08-29smb: Annotate struct xattr_smb_acl with __counted_by()Thorsten Blum
Add the __counted_by compiler attribute to the flexible array member entries to improve access bounds-checking via CONFIG_UBSAN_BOUNDS and CONFIG_FORTIFY_SOURCE. Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-08-30Merge tag 'execve-v6.11-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull execve fix from Kees Cook: - binfmt_elf_fdpic: fix AUXV size with ELF_HWCAP2 (Max Filippov) * tag 'execve-v6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: binfmt_elf_fdpic: fix AUXV size calculation when ELF_HWCAP2 is defined
2024-08-30dcache: keep dentry_hashtable or d_hash_shift even when not usedStephen Brennan
The runtime constant feature removes all the users of these variables, allowing the compiler to optimize them away. It's quite difficult to extract their values from the kernel text, and the memory saved by removing them is tiny, and it was never the point of this optimization. Since the dentry_hashtable is a core data structure, it's valuable for debugging tools to be able to read it easily. For instance, scripts built on drgn, like the dentrycache script[1], rely on it to be able to perform diagnostics on the contents of the dcache. Annotate it as used, so the compiler doesn't discard it. Link: https://github.com/oracle-samples/drgn-tools/blob/3afc56146f54d09dfd1f6d3c1b7436eda7e638be/drgn_tools/dentry.py#L325-L355 [1] Fixes: e3c92e81711d ("runtime constants: add x86 architecture support") Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-08-29fuse: disable the combination of passthrough and writeback cacheBernd Schubert
Current design and handling of passthrough is without fuse caching and with that FUSE_WRITEBACK_CACHE is conflicting. Fixes: 7dc4e97a4f9a ("fuse: introduce FUSE_PASSTHROUGH capability") Cc: stable@kernel.org # v6.9 Signed-off-by: Bernd Schubert <bschubert@ddn.com> Acked-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2024-08-28cifs: Fix FALLOC_FL_ZERO_RANGE to preflush buffered part of target regionDavid Howells
Under certain conditions, the range to be cleared by FALLOC_FL_ZERO_RANGE may only be buffered locally and not yet have been flushed to the server. For example: xfs_io -f -t -c "pwrite -S 0x41 0 4k" \ -c "pwrite -S 0x42 4k 4k" \ -c "fzero 0 4k" \ -c "pread -v 0 8k" /xfstest.test/foo will write two 4KiB blocks of data, which get buffered in the pagecache, and then fallocate() is used to clear the first 4KiB block on the server - but we don't flush the data first, which means the EOF position on the server is wrong, and so the FSCTL_SET_ZERO_DATA RPC fails (and xfs_io ignores the error), but then when we try to read it, we see the old data. Fix this by preflushing any part of the target region that above the server's idea of the EOF position to force the server to update its EOF position. Note, however, that we don't want to simply expand the file by moving the EOF before doing the FSCTL_SET_ZERO_DATA[*] because someone else might see the zeroed region or if the RPC fails we then have to try to clean it up or risk getting corruption. [*] And we have to move the EOF first otherwise FSCTL_SET_ZERO_DATA won't do what we want. This fixes the generic/008 xfstest. [!] Note: A better way to do this might be to split the operation into two parts: we only do FSCTL_SET_ZERO_DATA for the part of the range below the server's EOF and then, if that worked, invalidate the buffered pages for the part above the range. Fixes: 6b69040247e1 ("cifs/smb3: Fix data inconsistent when zero file range") Signed-off-by: David Howells <dhowells@redhat.com> cc: Steve French <stfrench@microsoft.com> cc: Zhang Xiaoxu <zhangxiaoxu5@huawei.com> cc: Pavel Shilovsky <pshilov@microsoft.com> cc: Paulo Alcantara <pc@manguebit.com> cc: Shyam Prasad N <nspmangalore@gmail.com> cc: Rohith Surabattula <rohiths.msft@gmail.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org cc: linux-mm@kvack.org Signed-off-by: Steve French <stfrench@microsoft.com>
2024-08-29Merge tag 'nfsd-6.11-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: - Fix a number of crashers - Update email address for an NFSD reviewer * tag 'nfsd-6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: fs/nfsd: fix update of inode attrs in CB_GETATTR nfsd: fix potential UAF in nfsd4_cb_getattr_release nfsd: hold reference to delegation when updating it for cb_getattr MAINTAINERS: Update Olga Kornievskaia's email address nfsd: prevent panic for nfsv4.0 closed files in nfs4_show_open nfsd: ensure that nfsd4_fattr_args.context is zeroed out
2024-08-29Merge tag 'for-6.11-rc5-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - fix use-after-free when submitting bios for read, after an error and partially submitted bio the original one is freed while it can be still be accessed again - fix fstests case btrfs/301, with enabled quotas wait for delayed iputs when flushing delalloc - fix periodic block group reclaim, an unitialized value can be returned if there are no block groups to reclaim - fix build warning (-Wmaybe-uninitialized) * tag 'for-6.11-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix uninitialized return value from btrfs_reclaim_sweep() btrfs: fix a use-after-free when hitting errors inside btrfs_submit_chunk() btrfs: initialize last_extent_end to fix -Wmaybe-uninitialized warning in extent_fiemap() btrfs: run delayed iputs when flushing delalloc
2024-08-28fuse: update stats for pages in dropped aux writeback listJoanne Koong
In the case where the aux writeback list is dropped (e.g. the pages have been truncated or the connection is broken), the stats for its pages and backing device info need to be updated as well. Fixes: e2653bd53a98 ("fuse: fix leaked aux requests") Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Cc: <stable@vger.kernel.org> # v5.1 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2024-08-28fuse: clear PG_uptodate when using a stolen pageMiklos Szeredi
Originally when a stolen page was inserted into fuse's page cache by fuse_try_move_page(), it would be marked uptodate. Then fuse_readpages_end() would call SetPageUptodate() again on the already uptodate page. Commit 413e8f014c8b ("fuse: Convert fuse_readpages_end() to use folio_end_read()") changed that by replacing the SetPageUptodate() + unlock_page() combination with folio_end_read(), which does mostly the same, except it sets the uptodate flag with an xor operation, which in the above scenario resulted in the uptodate flag being cleared, which in turn resulted in EIO being returned on the read. Fix by clearing PG_uptodate instead of setting it in fuse_try_move_page(), conforming to the expectation of folio_end_read(). Reported-by: Jürg Billeter <j@bitron.ch> Debugged-by: Matthew Wilcox <willy@infradead.org> Fixes: 413e8f014c8b ("fuse: Convert fuse_readpages_end() to use folio_end_read()") Cc: <stable@vger.kernel.org> # v6.10 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2024-08-28fuse: fix memory leak in fuse_create_openyangyun
The memory of struct fuse_file is allocated but not freed when get_create_ext return error. Fixes: 3e2b6fdbdc9a ("fuse: send security context of inode on file") Cc: stable@vger.kernel.org # v5.17 Signed-off-by: yangyun <yangyun50@huawei.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2024-08-28fuse: check aborted connection before adding requests to pending list for ↵Joanne Koong
resending There is a race condition where inflight requests will not be aborted if they are in the middle of being re-sent when the connection is aborted. If fuse_resend has already moved all the requests in the fpq->processing lists to its private queue ("to_queue") and then the connection starts and finishes aborting, these requests will be added to the pending queue and remain on it indefinitely. Fixes: 760eac73f9f6 ("fuse: Introduce a new notification type for resend pending requests") Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com> Cc: <stable@vger.kernel.org> # v6.9 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2024-08-28fuse: use unsigned type for getxattr/listxattr size truncationJann Horn
The existing code uses min_t(ssize_t, outarg.size, XATTR_LIST_MAX) when parsing the FUSE daemon's response to a zero-length getxattr/listxattr request. On 32-bit kernels, where ssize_t and outarg.size are the same size, this is wrong: The min_t() will pass through any size values that are negative when interpreted as signed. fuse_listxattr() will then return this userspace-supplied negative value, which callers will treat as an error value. This kind of bug pattern can lead to fairly bad security bugs because of how error codes are used in the Linux kernel. If a caller were to convert the numeric error into an error pointer, like so: struct foo *func(...) { int len = fuse_getxattr(..., NULL, 0); if (len < 0) return ERR_PTR(len); ... } then it would end up returning this userspace-supplied negative value cast to a pointer - but the caller of this function wouldn't recognize it as an error pointer (IS_ERR_VALUE() only detects values in the narrow range in which legitimate errno values are), and so it would just be treated as a kernel pointer. I think there is at least one theoretical codepath where this could happen, but that path would involve virtio-fs with submounts plus some weird SELinux configuration, so I think it's probably not a concern in practice. Cc: stable@vger.kernel.org # v4.9 Fixes: 63401ccdb2ca ("fuse: limit xattr returned size") Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2024-08-28cifs: Fix copy offload to flush destination regionDavid Howells
Fix cifs_file_copychunk_range() to flush the destination region before invalidating it to avoid potential loss of data should the copy fail, in whole or in part, in some way. Fixes: 7b2404a886f8 ("cifs: Fix flushing, invalidation and file size with copy_file_range()") Signed-off-by: David Howells <dhowells@redhat.com> cc: Steve French <stfrench@microsoft.com> cc: Paulo Alcantara <pc@manguebit.com> cc: Shyam Prasad N <nspmangalore@gmail.com> cc: Rohith Surabattula <rohiths.msft@gmail.com> cc: Matthew Wilcox <willy@infradead.org> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org cc: linux-mm@kvack.org cc: linux-fsdevel@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2024-08-28netfs, cifs: Fix handling of short DIO readDavid Howells
Short DIO reads, particularly in relation to cifs, are not being handled correctly by cifs and netfslib. This can be tested by doing a DIO read of a file where the size of read is larger than the size of the file. When it crosses the EOF, it gets a short read and this gets retried, and in the case of cifs, the retry read fails, with the failure being translated to ENODATA. Fix this by the following means: (1) Add a flag, NETFS_SREQ_HIT_EOF, for the filesystem to set when it detects that the read did hit the EOF. (2) Make the netfslib read assessment stop processing subrequests when it encounters one with that flag set. (3) Return rreq->transferred, the accumulated contiguous amount read to that point, to userspace for a DIO read. (4) Make cifs set the flag and clear the error if the read RPC returned ENODATA. (5) Make cifs set the flag and clear the error if a short read occurred without error and the read-to file position is now at the remote inode size. Fixes: 69c3c023af25 ("cifs: Implement netfslib hooks") Signed-off-by: David Howells <dhowells@redhat.com> cc: Steve French <sfrench@samba.org> cc: Paulo Alcantara <pc@manguebit.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2024-08-28cifs: Fix lack of credit renegotiation on read retryDavid Howells
When netfslib asks cifs to issue a read operation, it prefaces this with a call to ->clamp_length() which cifs uses to negotiate credits, providing receive capacity on the server; however, in the event that a read op needs reissuing, netfslib doesn't call ->clamp_length() again as that could shorten the subrequest, leaving a gap. This causes the retried read to be done with zero credits which causes the server to reject it with STATUS_INVALID_PARAMETER. This is a problem for a DIO read that is requested that would go over the EOF. The short read will be retried, causing EINVAL to be returned to the user when it fails. Fix this by making cifs_req_issue_read() negotiate new credits if retrying (NETFS_SREQ_RETRYING now gets set in the read side as well as the write side in this instance). This isn't sufficient, however: the new credits might not be sufficient to complete the remainder of the read, so also add an additional field, rreq->actual_len, that holds the actual size of the op we want to perform without having to alter subreq->len. We then rely on repeated short reads being retried until we finish the read or reach the end of file and make a zero-length read. Also fix a couple of places where the subrequest start and length need to be altered by the amount so far transferred when being used. Fixes: 69c3c023af25 ("cifs: Implement netfslib hooks") Signed-off-by: David Howells <dhowells@redhat.com> cc: Steve French <sfrench@samba.org> cc: Paulo Alcantara <pc@manguebit.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>