summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2024-01-31btrfs: do not ASSERT() if the newly created subvolume already got readQu Wenruo
[BUG] There is a syzbot crash, triggered by the ASSERT() during subvolume creation: assertion failed: !anon_dev, in fs/btrfs/disk-io.c:1319 ------------[ cut here ]------------ kernel BUG at fs/btrfs/disk-io.c:1319! invalid opcode: 0000 [#1] PREEMPT SMP KASAN RIP: 0010:btrfs_get_root_ref.part.0+0x9aa/0xa60 <TASK> btrfs_get_new_fs_root+0xd3/0xf0 create_subvol+0xd02/0x1650 btrfs_mksubvol+0xe95/0x12b0 __btrfs_ioctl_snap_create+0x2f9/0x4f0 btrfs_ioctl_snap_create+0x16b/0x200 btrfs_ioctl+0x35f0/0x5cf0 __x64_sys_ioctl+0x19d/0x210 do_syscall_64+0x3f/0xe0 entry_SYSCALL_64_after_hwframe+0x63/0x6b ---[ end trace 0000000000000000 ]--- [CAUSE] During create_subvol(), after inserting root item for the newly created subvolume, we would trigger btrfs_get_new_fs_root() to get the btrfs_root of that subvolume. The idea here is, we have preallocated an anonymous device number for the subvolume, thus we can assign it to the new subvolume. But there is really nothing preventing things like backref walk to read the new subvolume. If that happens before we call btrfs_get_new_fs_root(), the subvolume would be read out, with a new anonymous device number assigned already. In that case, we would trigger ASSERT(), as we really expect no one to read out that subvolume (which is not yet accessible from the fs). But things like backref walk is still possible to trigger the read on the subvolume. Thus our assumption on the ASSERT() is not correct in the first place. [FIX] Fix it by removing the ASSERT(), and just free the @anon_dev, reset it to 0, and continue. If the subvolume tree is read out by something else, it should have already get a new anon_dev assigned thus we only need to free the preallocated one. Reported-by: Chenyuan Yang <chenyuan0y@gmail.com> Fixes: 2dfb1e43f57d ("btrfs: preallocate anon block device at first phase of snapshot creation") CC: stable@vger.kernel.org # 5.15+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-01-31btrfs: forbid deleting live subvol qgroupBoris Burkov
If a subvolume still exists, forbid deleting its qgroup 0/subvolid. This behavior generally leads to incorrect behavior in squotas and doesn't have a legitimate purpose. Fixes: cecbb533b5fc ("btrfs: record simple quota deltas in delayed refs") CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Boris Burkov <boris@bur.io> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-01-31btrfs: forbid creating subvol qgroupsBoris Burkov
Creating a qgroup 0/subvolid leads to various races and it isn't helpful, because you can't specify a subvol id when creating a subvol, so you can't be sure it will be the right one. Any requirements on the automatic subvol can be gratified by using a higher level qgroup and the inheritance parameters of subvol creation. Fixes: cecbb533b5fc ("btrfs: record simple quota deltas in delayed refs") CC: stable@vger.kernel.org # 4.14+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Boris Burkov <boris@bur.io> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-01-31btrfs: send: return EOPNOTSUPP on unknown flagsDavid Sterba
When some ioctl flags are checked we return EOPNOTSUPP, like for BTRFS_SCRUB_SUPPORTED_FLAGS, BTRFS_SUBVOL_CREATE_ARGS_MASK or fallocate modes. The EINVAL is supposed to be for a supported but invalid values or combination of options. Fix that when checking send flags so it's consistent with the rest. CC: stable@vger.kernel.org # 4.14+ Link: https://lore.kernel.org/linux-btrfs/CAL3q7H5rryOLzp3EKq8RTbjMHMHeaJubfpsVLF6H4qJnKCUR1w@mail.gmail.com/ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-01-30Merge tag 'erofs-for-6.8-rc3-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs fixes from Gao Xiang: - fix an infinite loop issue of sub-page compressed data support found with lengthy stress tests on a 64k-page arm64 VM - optimize the temporary buffer allocation for low-memory scenarios, which can reduce 20.21% on average under a heavy multi-app launch benchmark workload - get rid of unnecessary GFP_NOFS * tag 'erofs-for-6.8-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: relaxed temporary buffers allocation on readahead erofs: fix infinite loop due to a race of filling compressed_bvecs erofs: get rid of unneeded GFP_NOFS
2024-01-30kernfs: fix false-positive WARN(nr_mmapped) in kernfs_drain_open_filesNeel Natu
Prior to this change 'on->nr_mmapped' tracked the total number of mmaps across all of its associated open files via kernfs_fop_mmap(). Thus if the file descriptor associated with a kernfs_open_file was mmapped 10 times then we would have: 'of->mmapped = true' and 'of_on(of)->nr_mmapped = 10'. The problem is that closing or draining a 'of->mmapped' file would only decrement one from the 'of_on(of)->nr_mmapped' counter. For e.g. we have this from kernfs_unlink_open_file(): if (of->mmapped) on->nr_mmapped--; The WARN_ON_ONCE(on->nr_mmapped) in kernfs_drain_open_files() is easy to reproduce by: 1. opening a (mmap-able) kernfs file. 2. mmap-ing that file more than once (mapping just once masks the issue). 3. trigger a drain of that kernfs file. Modulo out-of-tree patches I was able to trigger this reliably by identifying pci device nodes in sysfs that have resource regions that are mmap-able and that don't have any driver attached to them (steps 1 and 2). For step 3 we can "echo 1 > remove" to trigger a kernfs_drain. Signed-off-by: Neel Natu <neelnatu@google.com> Link: https://lore.kernel.org/r/20240127234636.609265-1-neelnatu@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-01-30kernfs: RCU protect kernfs_nodes and avoid kernfs_idr_lock in ↵Tejun Heo
kernfs_find_and_get_node_by_id() The BPF helper bpf_cgroup_from_id() calls kernfs_find_and_get_node_by_id() which acquires kernfs_idr_lock, which is an non-raw non-IRQ-safe lock. This can lead to deadlocks as bpf_cgroup_from_id() can be called from any BPF programs including e.g. the ones that attach to functions which are holding the scheduler rq lock. Consider the following BPF program: SEC("fentry/__set_cpus_allowed_ptr_locked") int BPF_PROG(__set_cpus_allowed_ptr_locked, struct task_struct *p, struct affinity_context *affn_ctx, struct rq *rq, struct rq_flags *rf) { struct cgroup *cgrp = bpf_cgroup_from_id(p->cgroups->dfl_cgrp->kn->id); if (cgrp) { bpf_printk("%d[%s] in %s", p->pid, p->comm, cgrp->kn->name); bpf_cgroup_release(cgrp); } return 0; } __set_cpus_allowed_ptr_locked() is called with rq lock held and the above BPF program calls bpf_cgroup_from_id() within leading to the following lockdep warning: ===================================================== WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected 6.7.0-rc3-work-00053-g07124366a1d7-dirty #147 Not tainted ----------------------------------------------------- repro/1620 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire: ffffffff833b3688 (kernfs_idr_lock){+.+.}-{2:2}, at: kernfs_find_and_get_node_by_id+0x1e/0x70 and this task is already holding: ffff888237ced698 (&rq->__lock){-.-.}-{2:2}, at: task_rq_lock+0x4e/0xf0 which would create a new lock dependency: (&rq->__lock){-.-.}-{2:2} -> (kernfs_idr_lock){+.+.}-{2:2} ... Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock(kernfs_idr_lock); local_irq_disable(); lock(&rq->__lock); lock(kernfs_idr_lock); <Interrupt> lock(&rq->__lock); *** DEADLOCK *** ... Call Trace: dump_stack_lvl+0x55/0x70 dump_stack+0x10/0x20 __lock_acquire+0x781/0x2a40 lock_acquire+0xbf/0x1f0 _raw_spin_lock+0x2f/0x40 kernfs_find_and_get_node_by_id+0x1e/0x70 cgroup_get_from_id+0x21/0x240 bpf_cgroup_from_id+0xe/0x20 bpf_prog_98652316e9337a5a___set_cpus_allowed_ptr_locked+0x96/0x11a bpf_trampoline_6442545632+0x4f/0x1000 __set_cpus_allowed_ptr_locked+0x5/0x5a0 sched_setaffinity+0x1b3/0x290 __x64_sys_sched_setaffinity+0x4f/0x60 do_syscall_64+0x40/0xe0 entry_SYSCALL_64_after_hwframe+0x46/0x4e Let's fix it by protecting kernfs_node and kernfs_root with RCU and making kernfs_find_and_get_node_by_id() acquire rcu_read_lock() instead of kernfs_idr_lock. This adds an rcu_head to kernfs_node making it larger by 16 bytes on 64bit. Combined with the preceding rearrange patch, the net increase is 8 bytes. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Andrea Righi <andrea.righi@canonical.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Link: https://lore.kernel.org/r/20240109214828.252092-4-tj@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-01-30xfs: remove conditional building of rt geometry validator functionsDarrick J. Wong
I mistakenly turned off CONFIG_XFS_RT in the Kconfig file for arm64 variant of the djwong-wtf git branch. Unfortunately, it took me a good hour to figure out that RT wasn't built because this is what got printed to dmesg: XFS (sda2): realtime geometry sanity check failed XFS (sda2): Metadata corruption detected at xfs_sb_read_verify+0x170/0x190 [xfs], xfs_sb block 0x0 Whereas I would have expected: XFS (sda2): Not built with CONFIG_XFS_RT XFS (sda2): RT mount failed The root cause of these problems is the conditional compilation of the new functions xfs_validate_rtextents and xfs_compute_rextslog that I introduced in the two commits listed below. The !RT versions of these functions return false and 0, respectively, which causes primary superblock validation to fail, which explains the first message. Move the two functions to other parts of libxfs that are not conditionally defined by CONFIG_XFS_RT and remove the broken stubs so that validation works again. Fixes: e14293803f4e ("xfs: don't allow overly small or large realtime volumes") Fixes: a6a38f309afc ("xfs: make rextslog computation consistent with mkfs") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-01-29Merge tag 'jfs-6.8-rc3' of github.com:kleikamp/linux-shaggyLinus Torvalds
Pull jfs fix from David Kleikamp: "Revert a bad sanity check" * tag 'jfs-6.8-rc3' of github.com:kleikamp/linux-shaggy: Revert "jfs: fix shift-out-of-bounds in dbJoin"
2024-01-29Merge tag 'trace-v6.8-rc1-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: "Two small fixes for tracefs and eventfs: - Fix register_snapshot_trigger() on allocation error If the snapshot fails to allocate, the register_snapshot_trigger() can still return success. If the call to tracing_alloc_snapshot_instance() returned anything but 0, it returned 0, but it should have been returning the error code from that allocation function. - Remove leftover code from tracefs doing a dentry walk on remount. The update_gid() function was called by the tracefs code on remount to update the gid of eventfs, but that is no longer the case, but that code wasn't deleted. Nothing calls it. Remove it" * tag 'trace-v6.8-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracefs: remove stale 'update_gid' code tracing/trigger: Fix to return error if failed to alloc snapshot
2024-01-29Merge tag 'mm-hotfixes-stable-2024-01-28-23-21' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "22 hotfixes. 11 are cc:stable and the remainder address post-6.7 issues or aren't considered appropriate for backporting" * tag 'mm-hotfixes-stable-2024-01-28-23-21' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (22 commits) mm: thp_get_unmapped_area must honour topdown preference mm: huge_memory: don't force huge page alignment on 32 bit userfaultfd: fix mmap_changing checking in mfill_atomic_hugetlb selftests/mm: ksm_tests should only MADV_HUGEPAGE valid memory scs: add CONFIG_MMU dependency for vfree_atomic() mm/memory: fix folio_set_dirty() vs. folio_mark_dirty() in zap_pte_range() mm/huge_memory: fix folio_set_dirty() vs. folio_mark_dirty() selftests/mm: Update va_high_addr_switch.sh to check CPU for la57 flag selftests: mm: fix map_hugetlb failure on 64K page size systems MAINTAINERS: supplement of zswap maintainers update stackdepot: make fast paths lock-less again stackdepot: add stats counters exported via debugfs mm, kmsan: fix infinite recursion due to RCU critical section mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again selftests/mm: switch to bash from sh MAINTAINERS: add man-pages git trees mm: memcontrol: don't throttle dying tasks on memory.high mm: mmap: map MAP_STACK to VM_NOHUGEPAGE uprobes: use pagesize-aligned virtual address when replacing pages selftests/mm: mremap_test: fix build warning ...
2024-01-29Revert "jfs: fix shift-out-of-bounds in dbJoin"Dave Kleikamp
This reverts commit cca974daeb6c43ea971f8ceff5a7080d7d49ee30. The added sanity check is incorrect. BUDMIN is not the wrong value and is too small. Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2024-01-29netfs: Fix missing zero-length check in unbuffered writeDavid Howells
Fix netfs_unbuffered_write_iter() to return immediately if generic_write_checks() returns 0, indicating there's nothing to write. Note that netfs_file_write_iter() already does this. Also, whilst we're at it, put in checks for the size being zero before we even take the locks. Note that generic_write_checks() can still reduce the size to zero, so we still need that check. Without this, a warning similar to the following is logged to dmesg: netfs: Zero-sized write [R=1b6da] and the syscall fails with EIO, e.g.: /sbin/ldconfig.real: Writing of cache extension data failed: Input/output error This can be reproduced on 9p by: xfs_io -f -c 'pwrite 0 0' /xfstest.test/foo Fixes: 153a9961b551 ("netfs: Implement unbuffered/DIO write support") Reported-by: Eric Van Hensbergen <ericvh@kernel.org> Link: https://lore.kernel.org/r/ZbQUU6QKmIftKsmo@FV7GG9FTHL/ Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20240129094924.1221977-3-dhowells@redhat.com Tested-by: Dominique Martinet <asmadeus@codewreck.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Jeff Layton <jlayton@kernel.org> cc: <v9fs@lists.linux.dev> cc: <linux_oss@crudebyte.com> cc: <netfs@lists.linux.dev> cc: <linux-fsdevel@vger.kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-01-29netfs: Fix i_dio_count leak on DIO read past i_sizeMarc Dionne
If netfs_begin_read gets a NETFS_DIO_READ request that begins past i_size, it won't perform any i/o and just return 0. This will leak an increment to i_dio_count that is done at the top of the function. This can cause subsequent buffered read requests to block indefinitely, waiting for a non existing dio operation to complete. Add a inode_dio_end() for the NETFS_DIO_READ case, before returning. Signed-off-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20240129094924.1221977-2-dhowells@redhat.com Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Jeff Layton <jlayton@kernel.org> cc: <linux-afs@lists.infradead.org> cc: <netfs@lists.linux.dev> cc: <linux-fsdevel@vger.kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-01-29fs/ntfs3: Slightly simplify ntfs_inode_printk()Christophe JAILLET
The size passed to snprintf() includes the space for the trailing space. So there is no reason here not to use all the available space. So remove the -1 when computing 'name_len'. While at it, use the size of the array directly instead of the intermediate 'name_len' variable. snprintf() also guaranties that the buffer if NULL terminated, so there is no need to write an additional trailing NULL "To be sure". Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2024-01-29fs/ntfs3: Add ioctl operation for directories (FITRIM)Nekun
While ntfs3 supports discards, FITRIM ioctl() command has defined only for regular files. This may confuse users trying to invoke `fstrim` utility with the directory argument (for example, call `fstrim <mountpoint>` which is the common practice). In this case, ioctl() returns -ENOTTY without any error messages in kernel ring buffer, this may be easily interpreted as no support for discards in ntfs3 driver. Currently only FITRIM command implemented in ntfs_ioctl() and passed inode used only for dereferencing NTFS superblock, so no need for separate ioctl() handler for directories, just add existing ntfs_ioctl() handler to ntfs_dir_operations. Signed-off-by: Nekun <nekokun@firemail.cc> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2024-01-29fs/ntfs3: Fix oob in ntfs_listxattrEdward Adam Davis
The length of name cannot exceed the space occupied by ea. Reported-and-tested-by: syzbot+65e940cfb8f99a97aca7@syzkaller.appspotmail.com Signed-off-by: Edward Adam Davis <eadavis@qq.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2024-01-29fs/ntfs3: Fix an NULL dereference bugDan Carpenter
The issue here is when this is called from ntfs_load_attr_list(). The "size" comes from le32_to_cpu(attr->res.data_size) so it can't overflow on a 64bit systems but on 32bit systems the "+ 1023" can overflow and the result is zero. This means that the kmalloc will succeed by returning the ZERO_SIZE_PTR and then the memcpy() will crash with an Oops on the next line. Fixes: be71b5cba2e6 ("fs/ntfs3: Add attrib operations") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2024-01-29xfs: reset XFS_ATTR_INCOMPLETE filter on node removalAndrey Albershteyn
In XFS_DAS_NODE_REMOVE_ATTR case, xfs_attr_mode_remove_attr() sets filter to XFS_ATTR_INCOMPLETE. The filter is then reset in xfs_attr_complete_op() if XFS_DA_OP_REPLACE operation is performed. The filter is not reset though if XFS just removes the attribute (args->value == NULL) with xfs_attr_defer_remove(). attr code goes to XFS_DAS_DONE state. Fix this by always resetting XFS_ATTR_INCOMPLETE filter. The replace operation already resets this filter in anyway and others are completed at this step hence don't need it. Fixes: fdaf1bb3cafc ("xfs: ATTR_REPLACE algorithm with LARP enabled needs rework") Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-01-29fs/ntfs3: Update inode->i_size after success write into compressed fileKonstantin Komarov
Reported-by: Giovanni Santini <giovannisantini93@yahoo.it> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2024-01-29fs/ntfs3: Fixed overflow check in mi_enum_attr()Konstantin Komarov
Reported-by: Robert Morris <rtm@csail.mit.edu> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2024-01-29fs/ntfs3: Correct function is_rst_area_validKonstantin Komarov
Reported-by: Robert Morris <rtm@csail.mit.edu> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2024-01-29fs/ntfs3: Use i_size_read and i_size_writeKonstantin Komarov
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2024-01-29fs/ntfs3: Prevent generic message "attempt to access beyond end of device"Konstantin Komarov
It used in test environment. Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2024-01-29fs/ntfs3: use non-movable memory for ntfs3 MFT buffer cacheIsm Hong
Since the buffer cache for ntfs3 metadata is not released until the file system is unmounted, allocating from the movable zone may result in cma allocation failures. This is due to the page still being used by ntfs3, leading to migration failures. To address this, this commit use sb_bread_umovable() instead of sb_bread(). This change prevents allocation from the movable zone, ensuring compatibility with scenarios where the buffer head is not released until unmount. This patch is inspired by commit a8ac900b8163("ext4: use non-movable memory for the ext4 superblock"). The issue is found when playing video files stored in NTFS on the Android TV platform. During this process, the media parser reads the video file, causing ntfs3 to allocate buffer cache from the CMA area. Subsequently, the hardware decoder attempts to allocate memory from the same CMA area. However, the page is still in use by ntfs3, resulting in a migrate failure in alloc_contig_range(). The pinned page and allocating stacktrace reported by page owner shows below: page:ffffffff00b68880 refcount:3 mapcount:0 mapping:ffffff80046aa828 index:0xc0040 pfn:0x20fa4 aops:def_blk_aops ino:0 flags: 0x2020(active|private) page dumped because: migration failure page last allocated via order 0, migratetype Movable, gfp_mask 0x108c48 (GFP_NOFS|__GFP_NOFAIL|__GFP_HARDWALL|__GFP_MOVABLE), page_owner tracks the page as allocated prep_new_page get_page_from_freelist __alloc_pages_nodemask pagecache_get_page __getblk_gfp __bread_gfp ntfs_read_run_nb ntfs_read_bh mi_read ntfs_iget5 dir_search_u ntfs_lookup __lookup_slow lookup_slow walk_component path_lookupat Signed-off-by: Ism Hong <ism.hong@gmail.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2024-01-28bcachefs: unlock parent dir if entry is not found in subvolume deletionGuoyu Ou
Parent dir is locked by user_path_locked_at() before validating the required dentry. It should be unlocked if we can not perform the deletion. This fixes the problem: $ bcachefs subvolume delete not-exist-entry BCH_IOCTL_SUBVOLUME_DESTROY ioctl error: No such file or directory $ bcachefs subvolume delete not-exist-entry the second will stuck because the parent dir is locked in the previous deletion. Signed-off-by: Guoyu Ou <benogy@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-28bcachefs: Fix build on parisc by avoiding __multi3()Helge Deller
The gcc compiler on paric does support the __int128 type, although the architecture does not have native 128-bit support. The effect is, that the bcachefs u128_square() function will pull in the libgcc __multi3() helper, which breaks the kernel build when bcachefs is built as module since this function isn't currently exported in arch/parisc/kernel/parisc_ksyms.c. The build failure can be seen in the latest debian kernel build at: https://buildd.debian.org/status/fetch.php?pkg=linux&arch=hppa&ver=6.7.1-1%7Eexp1&stamp=1706132569&raw=0 We prefer to not export that symbol, so fall back to the optional 64-bit implementation provided by bcachefs and thus avoid usage of __multi3(). Signed-off-by: Helge Deller <deller@gmx.de> Cc: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-28tracefs: remove stale 'update_gid' codeLinus Torvalds
The 'eventfs_update_gid()' function is no longer called, so remove it (and the helper function it uses). Link: https://lore.kernel.org/all/CAHk-=wj+DsZZ=2iTUkJ-Nojs9fjYMvPs1NuoM3yK7aTDtJfPYQ@mail.gmail.com/ Fixes: 8186fff7ab64 ("tracefs/eventfs: Use root and instance inodes as default ownership") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-01-27Merge tag 'xfs-6.8-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs fix from Chandan Babu: - Fix read only mounts when using fsopen mount API * tag 'xfs-6.8-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: read only mounts with fsopen mount API are busted
2024-01-27Merge tag 'bcachefs-2024-01-26' of https://evilpiepirate.org/git/bcachefsLinus Torvalds
Pull bcachefs fixes from Kent Overstreet: - fix for REQ_OP_FLUSH usage; this fixes filesystems going read only with -EOPNOTSUPP from the block layer. (this really should have gone in with the block layer patch causing the -EOPNOTSUPP, or should have gone in before). - fix an allocation in non-sleepable context - fix one source of srcu lock latency, on devices with terrible discard latency - fix a reattach_inode() issue in fsck * tag 'bcachefs-2024-01-26' of https://evilpiepirate.org/git/bcachefs: bcachefs: __lookup_dirent() works in snapshot, not subvol bcachefs: discard path uses unlock_long() bcachefs: fix incorrect usage of REQ_OP_FLUSH bcachefs: Add gfp flags param to bch2_prt_task_backtrace()
2024-01-27Merge tag '6.8-rc2-smb3-server-fixes' of git://git.samba.org/ksmbdLinus Torvalds
Pull smb server fixes from Steve French: - Fix netlink OOB - Minor kernel doc fix * tag '6.8-rc2-smb3-server-fixes' of git://git.samba.org/ksmbd: ksmbd: fix global oob in ksmbd_nl_policy smb: Fix some kernel-doc comments
2024-01-27Merge tag '6.8-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull smb client fixes from Steve French: "Nine cifs/smb client fixes - Four network error fixes (three relating to replays of requests that need to be retried, and one fixing some places where we were returning the wrong rc up the stack on network errors) - Two multichannel fixes including locking fix and case where subset of channels need reconnect - netfs integration fixup: share remote i_size with netfslib - Two small cleanups (one for addressing a clang warning)" * tag '6.8-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: fix stray unlock in cifs_chan_skip_or_disable cifs: set replay flag for retries of write command cifs: commands that are retried should have replay flag set cifs: helper function to check replayable error codes cifs: translate network errors on send to -ECONNABORTED cifs: cifs_pick_channel should try selecting active channels cifs: Share server EOF pos with netfslib smb: Work around Clang __bdos() type confusion smb: client: delete "true", "false" defines
2024-01-27erofs: relaxed temporary buffers allocation on readaheadChunhai Guo
Even with inplace decompression, sometimes very few temporary buffers may be still needed for a single decompression shot (e.g. 16 pages for 64k sliding window or 4 pages for 16k sliding window). In low-memory scenarios, it would be better to try to allocate with GFP_NOWAIT on readahead first. That can help reduce the time spent on page allocation under durative memory pressure. Here are detailed performance numbers under multi-app launch benchmark workload [1] on ARM64 Android devices (8-core CPU and 8GB of memory) running a 5.15 LTS kernel with EROFS of 4k pclusters: +----------------------------------------------+ | LZ4 | vanilla | patched | diff | |----------------+---------+---------+---------| | Average (ms) | 3364 | 2684 | -20.21% | [64k sliding window] |----------------+---------+---------+---------| | Average (ms) | 2079 | 1610 | -22.56% | [16k sliding window] +----------------------------------------------+ The total size of system images for 4k pclusters is almost unchanged: (64k sliding window) 9,117,044 KB (16k sliding window) 9,113,096 KB Therefore, in addition to switch the sliding window from 64k to 16k, after applying this patch, it can eventually save 52.14% (3364 -> 1610) on average with no memory reservation. That is particularly useful for embedded devices with limited resources. [1] https://lore.kernel.org/r/20240109074143.4138783-1-guochunhai@vivo.com Suggested-by: Gao Xiang <xiang@kernel.org> Signed-off-by: Chunhai Guo <guochunhai@vivo.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Yue Hu <huyue2@coolpad.com> Link: https://lore.kernel.org/r/20240126140142.201718-1-hsiangkao@linux.alibaba.com
2024-01-26ramfs: Initialize security of in-memory inodesRoberto Sassu
Add a call security_inode_init_security() after ramfs_get_inode(), to let LSMs initialize the inode security field. Skip ramfs_fill_super(), as the initialization is done through the sb_set_mnt_opts hook. Calling security_inode_init_security() call inside ramfs_get_inode() is not possible since, for CONFIG_SHMEM=n, tmpfs also calls the former after the latter. Pass NULL as initxattrs() callback to security_inode_init_security(), since the purpose of the call is only to initialize the in-memory inodes. Cc: Hugh Dickins <hughd@google.com> Acked-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
2024-01-26fs/9p: fix dups even in uncached modeEric Van Hensbergen
In uncached mode we were still seeing duplicate getattr requests because of aggressive dropping of inodes. Inode "freshness" is guarded by other mechanisms when caches are disabled so this is unnecessary and increases overhead of almost every operation. Signed-off-by: Eric Van Hensbergen <ericvh@kernel.org>
2024-01-26fs/9p: simplify iget to remove unnecessary pathsEric Van Hensbergen
Remove the additional comparison operators and switch to simply lookup by inode number (aka qid.path). Signed-off-by: Eric Van Hensbergen <ericvh@kernel.org>
2024-01-26fs/9p: rework qid2ino logicEric Van Hensbergen
This changes from a function to a macro because we can figure out if we are 32 or 64 bit at compile time. Signed-off-by: Eric Van Hensbergen <ericvh@kernel.org>
2024-01-26fs/9p: Eliminate now unused v9fs_get_inodeEric Van Hensbergen
Now with all inode allocation going through get_from_fid functions we can remove v9fs_get_inode and reduce us down to a single inode allocation path. Signed-off-by: Eric Van Hensbergen <ericvh@kernel.org>
2024-01-26fs/9p: Eliminate redundant non-cache path in mknodEric Van Hensbergen
Like symlink, mknod had a seperate path with different inode allocation -- but this seems unnecessary, so eliminating this path. Signed-off-by: Eric Van Hensbergen <ericvh@kernel.org>
2024-01-26fs/9p: remove walk and inode allocation from symlinkEric Van Hensbergen
Symlink had a bunch of extra operations which essentially end up discarded. It was walking the fid to the new file and creating an inode for it, but those semantics are part of tsymlink. This did prepopulate the cache, but that also seems potentially unnecessary and frought with peril. Signed-off-by: Eric Van Hensbergen <ericvh@kernel.org>
2024-01-26fs/9p: convert mkdir to use get_new_inodeEric Van Hensbergen
mkdir had different code paths for inode creation, cache used the get_new_inode_from_fid helper, but non-cached used v9fs_get_inode. Collapsed into a single implementation across both as there should be no difference. Signed-off-by: Eric Van Hensbergen <ericvh@kernel.org>
2024-01-26fs/9p: switch vfsmount to use v9fs_get_new_inodeEric Van Hensbergen
In the process of cleaning up inode number allocation, I noticed several functions which didn't use the standard helper allocators. This patch fixes the allocation in the mount entrypoint. Signed-off-by: Eric Van Hensbergen <ericvh@kernel.org>
2024-01-26erofs: fix infinite loop due to a race of filling compressed_bvecsGao Xiang
I encountered a race issue after lengthy (~594647 secs) stress tests on a 64k-page arm64 VM with several 4k-block EROFS images. The timing is like below: z_erofs_try_inplace_io z_erofs_fill_bio_vec cmpxchg(&compressed_bvecs[].page, NULL, ..) [access bufvec] compressed_bvecs[] = *bvec; Previously, z_erofs_submit_queue() just accessed bufvec->page only, so other fields in bufvec didn't matter. After the subpage block support is landed, .offset and .end can be used too, but filling bufvec isn't an atomic operation which can cause inconsistency. Let's use a spinlock to keep the atomicity of each bufvec. More specifically, just reuse the existing spinlock `pcl->obj.lockref.lock` since it's rarely used (also it takes a short time if even used) as long as the pcluster has a reference. Fixes: 192351616a9d ("erofs: support I/O submission for sub-page compressed blocks") Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Sandeep Dhavale <dhavale@google.com> Link: https://lore.kernel.org/r/20240125120039.3228103-1-hsiangkao@linux.alibaba.com
2024-01-25fs/hugetlbfs/inode.c: mm/memory-failure.c: fix hugetlbfs hwpoison handlingSidhartha Kumar
has_extra_refcount() makes the assumption that the page cache adds a ref count of 1 and subtracts this in the extra_pins case. Commit a08c7193e4f1 (mm/filemap: remove hugetlb special casing in filemap.c) modifies __filemap_add_folio() by calling folio_ref_add(folio, nr); for all cases (including hugtetlb) where nr is the number of pages in the folio. We should adjust the number of references coming from the page cache by subtracing the number of pages rather than 1. In hugetlbfs_read_iter(), folio_test_has_hwpoisoned() is testing the wrong flag as, in the hugetlb case, memory-failure code calls folio_test_set_hwpoison() to indicate poison. folio_test_hwpoison() is the correct function to test for that flag. After these fixes, the hugetlb hwpoison read selftest passes all cases. Link: https://lkml.kernel.org/r/20240112180840.367006-1-sidhartha.kumar@oracle.com Fixes: a08c7193e4f1 ("mm/filemap: remove hugetlb special casing in filemap.c") Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Closes: https://lore.kernel.org/linux-mm/20230713001833.3778937-1-jiaqiyan@google.com/T/#m8e1469119e5b831bbd05d495f96b842e4a1c5519 Reported-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Tested-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Acked-by: Miaohe Lin <linmiaohe@huawei.com> Acked-by: Muchun Song <muchun.song@linux.dev> Cc: James Houghton <jthoughton@google.com> Cc: Jiaqi Yan <jiaqiyan@google.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: <stable@vger.kernel.org> [6.7+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-25bcachefs: __lookup_dirent() works in snapshot, not subvolKent Overstreet
Add a new helper, bch2_hash_lookup_in_snapshot(), for when we're not operating in a subvolume and already have a snapshot ID, and then use it in lookup_lostfound() -> __lookup_dirent(). This is a bugfix - lookup_lostfound() doesn't take a subvolume ID, we were passing a nonsense subvolume ID before, and don't have one to pass since we may be operating in an interior snapshot node that doesn't have a subvolume ID. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-25Merge tag 'ovl-fixes-6.8-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs Pull overlayfs fix from Amir Goldstein: "Change the on-disk format for the new "xwhiteouts" feature introduced in v6.7 The change reduces unneeded overhead of an extra getxattr per readdir. The only user of the "xwhiteout" feature is the external composefs tool, which has been updated to support the new on-disk format. This change is also designated for 6.7.y" * tag 'ovl-fixes-6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs: ovl: mark xwhiteouts directory with overlay.opaque='x'
2024-01-25Merge tag 'vfs-6.8-rc2.netfs' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull netfs fixes from Christian Brauner: "This contains various fixes for the netfs work merged earlier this cycle: afs: - Fix locking imbalance in afs_proc_addr_prefs_show() - Remove afs_dynroot_d_revalidate() which is redundant - Fix error handling during lookup - Hide sillyrenames from userspace. This fixes a race between silly-rename files being created/removed and userspace iterating over directory entries - Don't use unnecessary folio_*() functions cifs: - Don't use unnecessary folio_*() functions cachefiles: - erofs: Fix Null dereference when cachefiles are not doing ondemand-mode - Update mailing list netfs library: - Add Jeff Layton as reviewer - Update mailing list - Fix a error checking in netfs_perform_write() - fscache: Check error before dereferencing - Don't use unnecessary folio_*() functions" * tag 'vfs-6.8-rc2.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: afs: Fix missing/incorrect unlocking of RCU read lock afs: Remove afs_dynroot_d_revalidate() as it is redundant afs: Fix error handling with lookup via FS.InlineBulkStatus afs: Hide silly-rename files from userspace cachefiles, erofs: Fix NULL deref in when cachefiles is not doing ondemand-mode netfs: Fix a NULL vs IS_ERR() check in netfs_perform_write() netfs, fscache: Prevent Oops in fscache_put_cache() cifs: Don't use certain unnecessary folio_*() functions afs: Don't use certain unnecessary folio_*() functions netfs: Don't use certain unnecessary folio_*() functions netfs: Add Jeff Layton as reviewer netfs, cachefiles: Change mailing list
2024-01-25Merge tag 'nfsd-6.8-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: - Fix in-kernel RPC UDP transport - Fix NFSv4.0 RELEASE_LOCKOWNER * tag 'nfsd-6.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: nfsd: fix RELEASE_LOCKOWNER SUNRPC: use request size to initialize bio_vec in svc_udp_sendto()
2024-01-25ksmbd: fix global oob in ksmbd_nl_policyLin Ma
Similar to a reported issue (check the commit b33fb5b801c6 ("net: qualcomm: rmnet: fix global oob in rmnet_policy"), my local fuzzer finds another global out-of-bounds read for policy ksmbd_nl_policy. See bug trace below: ================================================================== BUG: KASAN: global-out-of-bounds in validate_nla lib/nlattr.c:386 [inline] BUG: KASAN: global-out-of-bounds in __nla_validate_parse+0x24af/0x2750 lib/nlattr.c:600 Read of size 1 at addr ffffffff8f24b100 by task syz-executor.1/62810 CPU: 0 PID: 62810 Comm: syz-executor.1 Tainted: G N 6.1.0 #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x8b/0xb3 lib/dump_stack.c:106 print_address_description mm/kasan/report.c:284 [inline] print_report+0x172/0x475 mm/kasan/report.c:395 kasan_report+0xbb/0x1c0 mm/kasan/report.c:495 validate_nla lib/nlattr.c:386 [inline] __nla_validate_parse+0x24af/0x2750 lib/nlattr.c:600 __nla_parse+0x3e/0x50 lib/nlattr.c:697 __nlmsg_parse include/net/netlink.h:748 [inline] genl_family_rcv_msg_attrs_parse.constprop.0+0x1b0/0x290 net/netlink/genetlink.c:565 genl_family_rcv_msg_doit+0xda/0x330 net/netlink/genetlink.c:734 genl_family_rcv_msg net/netlink/genetlink.c:833 [inline] genl_rcv_msg+0x441/0x780 net/netlink/genetlink.c:850 netlink_rcv_skb+0x14f/0x410 net/netlink/af_netlink.c:2540 genl_rcv+0x24/0x40 net/netlink/genetlink.c:861 netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline] netlink_unicast+0x54e/0x800 net/netlink/af_netlink.c:1345 netlink_sendmsg+0x930/0xe50 net/netlink/af_netlink.c:1921 sock_sendmsg_nosec net/socket.c:714 [inline] sock_sendmsg+0x154/0x190 net/socket.c:734 ____sys_sendmsg+0x6df/0x840 net/socket.c:2482 ___sys_sendmsg+0x110/0x1b0 net/socket.c:2536 __sys_sendmsg+0xf3/0x1c0 net/socket.c:2565 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7fdd66a8f359 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fdd65e00168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007fdd66bbcf80 RCX: 00007fdd66a8f359 RDX: 0000000000000000 RSI: 0000000020000500 RDI: 0000000000000003 RBP: 00007fdd66ada493 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007ffc84b81aff R14: 00007fdd65e00300 R15: 0000000000022000 </TASK> The buggy address belongs to the variable: ksmbd_nl_policy+0x100/0xa80 The buggy address belongs to the physical page: page:0000000034f47940 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1ccc4b flags: 0x200000000001000(reserved|node=0|zone=2) raw: 0200000000001000 ffffea00073312c8 ffffea00073312c8 0000000000000000 raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffffffff8f24b000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffffffff8f24b080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffffffff8f24b100: f9 f9 f9 f9 00 00 f9 f9 f9 f9 f9 f9 00 00 07 f9 ^ ffffffff8f24b180: f9 f9 f9 f9 00 05 f9 f9 f9 f9 f9 f9 00 00 00 05 ffffffff8f24b200: f9 f9 f9 f9 00 00 03 f9 f9 f9 f9 f9 00 00 04 f9 ================================================================== To fix it, add a placeholder named __KSMBD_EVENT_MAX and let KSMBD_EVENT_MAX to be its original value - 1 according to what other netlink families do. Also change two sites that refer the KSMBD_EVENT_MAX to correct value. Cc: stable@vger.kernel.org Fixes: 0626e6641f6b ("cifsd: add server handler for central processing and tranport layers") Signed-off-by: Lin Ma <linma@zju.edu.cn> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-01-25erofs: get rid of unneeded GFP_NOFSJingbo Xu
Clean up some leftovers since there is no way for EROFS to be called again from a reclaim context. Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240124031945.130782-1-jefflexu@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>