summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2022-11-14fs/ntfs3: Check fields while readingKonstantin Komarov
Added new functions index_hdr_check and index_buf_check. Now we check all stuff for correctness while reading from disk. Also fixed bug with stale nfs data. Reported-by: van fantasy <g1042620637@gmail.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2022-11-14fs/ntfs3: Correct ntfs_check_for_free_spaceKonstantin Komarov
zlen in some cases was bigger than correct value. Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2022-11-14fs/ntfs3: Restore correct state after ENOSPC in attr_data_get_blockKonstantin Komarov
Added new function ntfs_check_for_free_space. Added undo mechanism in attr_data_get_block. Fixes xfstest generic/083 Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2022-11-14fs/ntfs3: Changing locking in ntfs_renameKonstantin Komarov
In some cases we can be in deadlock because we tried to lock the same dir. Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2022-11-14fs/ntfs3: Fixing wrong logic in attr_set_size and ntfs_fallocateKonstantin Komarov
There were 2 problems: - in some cases we lost dirty flag; - cluster allocation can be called even when it wasn't needed. Fixes xfstest generic/465 Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2022-11-14fs/ntfs3: atomic_open implementationKonstantin Komarov
Added ntfs_atomic_open function. Relaxed locking in ntfs_create_inode. Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2022-11-14fs/ntfs3: Fix wrong indentationsKonstantin Komarov
Also simplifying code. Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2022-11-14fs/ntfs3: Change new sparse cluster processingKonstantin Komarov
Remove ntfs_sparse_cluster. Zero clusters in attr_allocate_clusters. Fixes xfstest generic/263 Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2022-11-14fs/ntfs3: Fixing work with sparse clustersKonstantin Komarov
Simplify logic in ntfs_extend_initialized_size, ntfs_sparse_cluster and ntfs_fallocate. Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2022-11-14fs/ntfs3: Simplify ntfs_update_mftmirr functionKonstantin Komarov
Make err assignment in one place. Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2022-11-14fs/ntfs3: Remove unused functionsKonstantin Komarov
Removed attr_must_be_resident and ntfs_query_def. Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2022-11-14fs/ntfs3: Fix sparse problemsKonstantin Komarov
Fixing various problems, detected by sparse. Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2022-11-14fs/ntfs3: Add ntfs_bitmap_weight_le function and refactoringKonstantin Komarov
Added ntfs_bitmap_weight_le function. Changed argument types of bits/bitmap functions. Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2022-11-14fs/ntfs3: Use _le variants of bitops functionsThomas Kühnel
The functions from bitops.h already have _le variants so use them to prevent invalid reads/writes of the bitmap on big endian systems. Signed-off-by: Thomas Kühnel <thomas.kuehnel@avm.de> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2022-11-14fs/ntfs3: Add functions to modify LE bitmapsThomas Kühnel
__bitmap_set/__bitmap_clear only works with bitmaps in CPU order. Define a variant of these functions in ntfs3 to handle modifying bitmaps read from the filesystem. Signed-off-by: Thomas Kühnel <thomas.kuehnel@avm.de> Reviewed-by: Nicolas Schier <n.schier@avm.de> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2022-11-14fs/ntfs3: Fix endian conversion in ni_fname_nameThomas Kühnel
ni_fname_name called ntfs_cmp_names_cpu which assumes that the first string is in CPU byte order and the second one in little endian. In this case both strings are little endian so ntfs_cmp_names is the correct function to call. Signed-off-by: Thomas Kühnel <thomas.kuehnel@avm.de> Reviewed-by: Nicolas Schier <n.schier@avm.de> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2022-11-14fs/ntfs3: Fix slab-out-of-bounds in r_pageYin Xiujiang
When PAGE_SIZE is 64K, if read_log_page is called by log_read_rst for the first time, the size of *buffer would be equal to DefaultLogPageSize(4K).But for *buffer operations like memcpy, if the memory area size(n) which being assigned to buffer is larger than 4K (log->page_size(64K) or bytes(64K-page_off)), it will cause an out of boundary error. Call trace: [...] kasan_report+0x44/0x130 check_memory_region+0xf8/0x1a0 memcpy+0xc8/0x100 ntfs_read_run_nb+0x20c/0x460 read_log_page+0xd0/0x1f4 log_read_rst+0x110/0x75c log_replay+0x1e8/0x4aa0 ntfs_loadlog_and_replay+0x290/0x2d0 ntfs_fill_super+0x508/0xec0 get_tree_bdev+0x1fc/0x34c [...] Fix this by setting variable r_page to NULL in log_read_rst. Signed-off-by: Yin Xiujiang <yinxiujiang@kylinos.cn> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2022-11-14erofs: fix missing xas_retry() in fscache modeJingbo Xu
The xarray iteration only holds the RCU read lock and thus may encounter XA_RETRY_ENTRY if there's process modifying the xarray concurrently. This will cause oops when referring to the invalid entry. Fix this by adding the missing xas_retry(), which will make the iteration wind back to the root node if XA_RETRY_ENTRY is encountered. Fixes: d435d53228dd ("erofs: change to use asynchronous io for fscache readpage/readahead") Suggested-by: David Howells <dhowells@redhat.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com> Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Link: https://lore.kernel.org/r/20221114121943.29987-1-jefflexu@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2022-11-14NFSD: Fix trace_nfsd_fh_verify_err() crasherChuck Lever
Now that the nfsd_fh_verify_err() tracepoint is always called on error, it needs to handle cases where the filehandle is not yet fully formed. Fixes: 93c128e709ae ("nfsd: ensure we always call fh_verify_error tracepoint") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>
2022-11-14ceph: fix NULL pointer dereference for req->r_sessionXiubo Li
The request's r_session maybe changed when it was forwarded or resent. Both the forwarding and resending cases the requests will be protected by the mdsc->mutex. Cc: stable@vger.kernel.org Link: https://bugzilla.redhat.com/show_bug.cgi?id=2137955 Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-11-14ceph: avoid putting the realm twice when decoding snaps failsXiubo Li
When decoding the snaps fails it maybe leaving the 'first_realm' and 'realm' pointing to the same snaprealm memory. And then it'll put it twice and could cause random use-after-free, BUG_ON, etc issues. Cc: stable@vger.kernel.org Link: https://tracker.ceph.com/issues/57686 Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-11-14ceph: fix a NULL vs IS_ERR() check when calling ceph_lookup_inode()Dan Carpenter
The ceph_lookup_inode() function returns error pointers. It never returns NULL. Fixes: aa87052dd965 ("ceph: fix incorrectly showing the .snap size for stat") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-11-13orangefs: fix mode handlingChristian Brauner
In 4053d2500beb ("orangefs: rework posix acl handling when creating new filesystem objects") we tried to precalculate the correct mode when creating a new inode. However, this leads to regressions when creating new filesystem objects. Even if we precalculate the mode we still need to call __orangefs_setattr() to perform additional checks and we also need to update the mode of ACL_TYPE_ACCESS acls set on the inode. The patch referenced above regressed that. Restore that part of the old behavior and remove the mode precalculation as it doesn't get us anything anymore. Fixes: 4053d2500beb ("orangefs: rework posix acl handling when creating new filesystem objects") Reported-by: Mike Marshall <hubcap@omnibond.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-11-12fs/ntfs3: Delete duplicate condition in ntfs_read_mft()Dan Carpenter
There were two patches which addressed the same bug and added the same condition: commit 6db620863f85 ("fs/ntfs3: Validate data run offset") commit 887bfc546097 ("fs/ntfs3: Fix slab-out-of-bounds read in run_unpack") Delete one condition. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2022-11-12fs/ntfs3: Add system.ntfs_attrib_be extended attributeDaniel Pinto
NTFS-3G provides the system.ntfs_attrib_be extended attribute, which has the same value as system.ntfs_attrib but represented in big-endian. Some utilities rely on the existence of this extended attribute. Improves compatibility with NTFS-3G by adding the system.ntfs_attrib_be extended attribute. Signed-off-by: Daniel Pinto <danielpinto52@gmail.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2022-11-12fs/ntfs3: Rename hidedotfiles mount option to hide_dot_filesDaniel Pinto
The hidedotfiles mount option provides the same functionality as the NTFS-3G hide_dot_files mount option. As such, it should be named the same for compatibility with NTGS-3G. Rename the hidedotfiles to hide_dot_files for compatbility with NTFS-3G. Signed-off-by: Daniel Pinto <danielpinto52@gmail.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2022-11-12fs/ntfs3: Add hidedotfiles to the list of enabled mount optionsDaniel Pinto
Currently, the ntfs3 driver does return the hidedotfiles mount option in the list of enabled mount options. This can confuse users who may doubt they enabled the option when not seeing in the list provided by the mount command. Add hidedotfiles mount option to the list of enabled options provided by the mount command when it is enabled. Signed-off-by: Daniel Pinto <danielpinto52@gmail.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2022-11-12fs/ntfs3: Make hidedotfiles mount option work when renaming filesDaniel Pinto
Currently, the hidedotfiles mount option only has an effect when creating new files. Removing or adding the starting dot when moving or renaming files does not update the hidden attribute. Make hidedotfiles also set or uset the hidden attribute when a file gains or loses its starting dot by being moved or renamed. Signed-off-by: Daniel Pinto <danielpinto52@gmail.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2022-11-12fs/ntfs3: Fix hidedotfiles mount option by reversing behaviourDaniel Pinto
Currently, the hidedotfiles mount option is behaving in the reverse way of what would be expected: enabling it disables setting the hidden attribute on files or directories with names starting with a dot and disabling it enables the setting. Reverse the behaviour of the hidedotfiles mount option so it matches what is expected. Signed-off-by: Daniel Pinto <danielpinto52@gmail.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2022-11-12fs/ntfs3: Add windows_names mount optionDaniel Pinto
When enabled, the windows_names mount option prevents the creation of files or directories with names not allowed by Windows. Use the same option name as NTFS-3G for compatibility. Signed-off-by: Daniel Pinto <danielpinto52@gmail.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2022-11-12fs/ntfs3: Eliminate unnecessary ternary operator in ntfs_d_compare()Nathan Chancellor
'a == b ? 0 : 1' is logically equivalent to 'a != b'. Suggested-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2022-11-12fs/ntfs3: Validate attribute data and valid sizesAbdun Nihaal
The data_size and valid_size fields of non resident attributes should be less than the its alloc_size field, but this is not checked in ntfs_read_mft function. Syzbot reports a allocation order warning due to a large unchecked value of data_size getting assigned to inode->i_size which is then passed to kcalloc. Add sanity check for ensuring that the data_size and valid_size fields are not larger than alloc_size field. Link: https://syzkaller.appspot.com/bug?extid=fa4648a5446460b7b963 Reported-and-tested-by: syzbot+fa4648a5446460b7b963@syzkaller.appspotmail.com Fixes: (82cae269cfa95) fs/ntfs3: Add initialization of super block Signed-off-by: Abdun Nihaal <abdun.nihaal@gmail.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2022-11-12fs/ntfs3: Use __GFP_NOWARN allocation at ntfs_fill_super()Tetsuo Handa
syzbot is reporting too large allocation at ntfs_fill_super() [1], for a crafted filesystem can contain bogus inode->i_size. Add __GFP_NOWARN in order to avoid too large allocation warning, than exhausting memory by using kvmalloc(). Link: https://syzkaller.appspot.com/bug?extid=33f3faaa0c08744f7d40 [1] Reported-by: syzot <syzbot+33f3faaa0c08744f7d40@syzkaller.appspotmail.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2022-11-12fs/ntfs3: Use __GFP_NOWARN allocation at wnd_init()Tetsuo Handa
syzbot is reporting too large allocation at wnd_init() [1], for a crafted filesystem can become wnd->nwnd close to UINT_MAX. Add __GFP_NOWARN in order to avoid too large allocation warning, than exhausting memory by using kvcalloc(). Link: https://syzkaller.appspot.com/bug?extid=fa4648a5446460b7b963 [1] Reported-by: syzot <syzbot+fa4648a5446460b7b963@syzkaller.appspotmail.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2022-11-12fs/ntfs3: Fix slab-out-of-bounds read in ntfs_trim_fsAbdun Nihaal
Syzbot reports an out of bound access in ntfs_trim_fs. The cause of this is using a loop termination condition that compares window index (iw) with wnd->nbits instead of wnd->nwnd, due to which the index used for wnd->free_bits exceeds the size of the array allocated. Fix the loop condition. Fixes: 3f3b442b5ad2 ("fs/ntfs3: Add bitmap") Link: https://syzkaller.appspot.com/bug?extid=b892240eac461e488d51 Reported-by: syzbot+b892240eac461e488d51@syzkaller.appspotmail.com Signed-off-by: Abdun Nihaal <abdun.nihaal@gmail.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2022-11-12fs/ntfs3: Validate index root when initialize NTFS securityEdward Lo
This enhances the sanity check for $SDH and $SII while initializing NTFS security, guarantees these index root are legit. [ 162.459513] BUG: KASAN: use-after-free in hdr_find_e.isra.0+0x10c/0x320 [ 162.460176] Read of size 2 at addr ffff8880037bca99 by task mount/243 [ 162.460851] [ 162.461252] CPU: 0 PID: 243 Comm: mount Not tainted 6.0.0-rc7 #42 [ 162.461744] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 162.462609] Call Trace: [ 162.462954] <TASK> [ 162.463276] dump_stack_lvl+0x49/0x63 [ 162.463822] print_report.cold+0xf5/0x689 [ 162.464608] ? unwind_get_return_address+0x3a/0x60 [ 162.465766] ? hdr_find_e.isra.0+0x10c/0x320 [ 162.466975] kasan_report+0xa7/0x130 [ 162.467506] ? _raw_spin_lock_irq+0xc0/0xf0 [ 162.467998] ? hdr_find_e.isra.0+0x10c/0x320 [ 162.468536] __asan_load2+0x68/0x90 [ 162.468923] hdr_find_e.isra.0+0x10c/0x320 [ 162.469282] ? cmp_uints+0xe0/0xe0 [ 162.469557] ? cmp_sdh+0x90/0x90 [ 162.469864] ? ni_find_attr+0x214/0x300 [ 162.470217] ? ni_load_mi+0x80/0x80 [ 162.470479] ? entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 162.470931] ? ntfs_bread_run+0x190/0x190 [ 162.471307] ? indx_get_root+0xe4/0x190 [ 162.471556] ? indx_get_root+0x140/0x190 [ 162.471833] ? indx_init+0x1e0/0x1e0 [ 162.472069] ? fnd_clear+0x115/0x140 [ 162.472363] ? _raw_spin_lock_irqsave+0x100/0x100 [ 162.472731] indx_find+0x184/0x470 [ 162.473461] ? sysvec_apic_timer_interrupt+0x57/0xc0 [ 162.474429] ? indx_find_buffer+0x2d0/0x2d0 [ 162.474704] ? do_syscall_64+0x3b/0x90 [ 162.474962] dir_search_u+0x196/0x2f0 [ 162.475381] ? ntfs_nls_to_utf16+0x450/0x450 [ 162.475661] ? ntfs_security_init+0x3d6/0x440 [ 162.475906] ? is_sd_valid+0x180/0x180 [ 162.476191] ntfs_extend_init+0x13f/0x2c0 [ 162.476496] ? ntfs_fix_post_read+0x130/0x130 [ 162.476861] ? iput.part.0+0x286/0x320 [ 162.477325] ntfs_fill_super+0x11e0/0x1b50 [ 162.477709] ? put_ntfs+0x1d0/0x1d0 [ 162.477970] ? vsprintf+0x20/0x20 [ 162.478258] ? set_blocksize+0x95/0x150 [ 162.478538] get_tree_bdev+0x232/0x370 [ 162.478789] ? put_ntfs+0x1d0/0x1d0 [ 162.479038] ntfs_fs_get_tree+0x15/0x20 [ 162.479374] vfs_get_tree+0x4c/0x130 [ 162.479729] path_mount+0x654/0xfe0 [ 162.480124] ? putname+0x80/0xa0 [ 162.480484] ? finish_automount+0x2e0/0x2e0 [ 162.480894] ? putname+0x80/0xa0 [ 162.481467] ? kmem_cache_free+0x1c4/0x440 [ 162.482280] ? putname+0x80/0xa0 [ 162.482714] do_mount+0xd6/0xf0 [ 162.483264] ? path_mount+0xfe0/0xfe0 [ 162.484782] ? __kasan_check_write+0x14/0x20 [ 162.485593] __x64_sys_mount+0xca/0x110 [ 162.486024] do_syscall_64+0x3b/0x90 [ 162.486543] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 162.487141] RIP: 0033:0x7f9d374e948a [ 162.488324] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008 [ 162.489728] RSP: 002b:00007ffe30e73d18 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5 [ 162.490971] RAX: ffffffffffffffda RBX: 0000561cdb43a060 RCX: 00007f9d374e948a [ 162.491669] RDX: 0000561cdb43a260 RSI: 0000561cdb43a2e0 RDI: 0000561cdb442af0 [ 162.492050] RBP: 0000000000000000 R08: 0000561cdb43a280 R09: 0000000000000020 [ 162.492459] R10: 00000000c0ed0000 R11: 0000000000000206 R12: 0000561cdb442af0 [ 162.493183] R13: 0000561cdb43a260 R14: 0000000000000000 R15: 00000000ffffffff [ 162.493644] </TASK> [ 162.493908] [ 162.494214] The buggy address belongs to the physical page: [ 162.494761] page:000000003e38a3d5 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x37bc [ 162.496064] flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff) [ 162.497278] raw: 000fffffc0000000 ffffea00000df1c8 ffffea00000df008 0000000000000000 [ 162.498928] raw: 0000000000000000 0000000000240000 00000000ffffffff 0000000000000000 [ 162.500542] page dumped because: kasan: bad access detected [ 162.501057] [ 162.501242] Memory state around the buggy address: [ 162.502230] ffff8880037bc980: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 162.502977] ffff8880037bca00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 162.503522] >ffff8880037bca80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 162.503963] ^ [ 162.504370] ffff8880037bcb00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 162.504766] ffff8880037bcb80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff Signed-off-by: Edward Lo <edward.lo@ambergroup.io> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2022-11-12xattr: use rbtree for simple_xattrsChristian Brauner
A while ago Vasily reported that it is possible to set a large number of xattrs on inodes of filesystems that make use of the simple xattr infrastructure. This includes all kernfs-based filesystems that support xattrs (e.g., cgroupfs and tmpfs). Both cgroupfs and tmpfs can be mounted by unprivileged users in unprivileged containers and root in an unprivileged container can set an unrestricted number of security.* xattrs and privileged users can also set unlimited trusted.* xattrs. As there are apparently users that have a fairly large number of xattrs we should scale a bit better. Other xattrs such as user.* are restricted for kernfs-based instances to a fairly limited number. Using a simple linked list protected by a spinlock used for set, get, and list operations doesn't scale well if users use a lot of xattrs even if it's not a crazy number. There's no need to bring in the big guns like rhashtables or rw semaphores for this. An rbtree with a rwlock, or limited rcu semanics and seqlock is enough. It scales within the constraints we are working in. By far the most common operation is getting an xattr. Setting xattrs should be a moderately rare operation. And listxattr() often only happens when copying xattrs between files or together with the contents to a new file. Holding a lock across listxattr() is unproblematic because it doesn't list the values of xattrs. It can only be used to list the names of all xattrs set on a file. And the number of xattr names that can be listed with listxattr() is limited to XATTR_LIST_MAX aka 65536 bytes. If a larger buffer is passed then vfs_listxattr() caps it to XATTR_LIST_MAX and if more xattr names are found it will return -E2BIG. In short, the maximum amount of memory that can be retrieved via listxattr() is limited. Of course, the API is broken as documented on xattr(7) already. In the future we might want to address this but for now this is the world we live in and have lived for a long time. But it does indeed mean that once an application goes over XATTR_LIST_MAX limit of xattrs set on an inode it isn't possible to copy the file and include its xattrs in the copy unless the caller knows all xattrs or limits the copy of the xattrs to important ones it knows by name (At least for tmpfs, and kernfs-based filesystems. Other filesystems might provide ways of achieving this.). Bonus of this port to rbtree+rwlock is that we shrink the memory consumption for users of the simple xattr infrastructure. Also add proper kernel documentation to all the functions. A big thanks to Paul for his comments. Cc: Vasily Averin <vvs@openvz.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-11-11Merge tag 'mm-hotfixes-stable-2022-11-11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc hotfixes from Andrew Morton: "22 hotfixes. Eight are cc:stable and the remainder address issues which were introduced post-6.0 or which aren't considered serious enough to justify a -stable backport" * tag 'mm-hotfixes-stable-2022-11-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (22 commits) docs: kmsan: fix formatting of "Example report" mm/damon/dbgfs: check if rm_contexts input is for a real context maple_tree: don't set a new maximum on the node when not reusing nodes maple_tree: fix depth tracking in maple_state arch/x86/mm/hugetlbpage.c: pud_huge() returns 0 when using 2-level paging fs: fix leaked psi pressure state nilfs2: fix use-after-free bug of ns_writer on remount x86/traps: avoid KMSAN bugs originating from handle_bug() kmsan: make sure PREEMPT_RT is off Kconfig.debug: ensure early check for KMSAN in CONFIG_KMSAN_WARN x86/uaccess: instrument copy_from_user_nmi() kmsan: core: kmsan_in_runtime() should return true in NMI context mm: hugetlb_vmemmap: include missing linux/moduleparam.h mm/shmem: use page_mapping() to detect page cache for uffd continue mm/memremap.c: map FS_DAX device memory as decrypted Partly revert "mm/thp: carry over dirty bit when thp splits on pmd" nilfs2: fix deadlock in nilfs_count_free_blocks() mm/mmap: fix memory leak in mmap_region() hugetlbfs: don't delete error page from pagecache maple_tree: reorganize testing to restore module testing ...
2022-11-11Merge tag 'nfsd-6.1-4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: - Fix an export leak - Fix a potential tracepoint crash * tag 'nfsd-6.1-4' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: nfsd: put the export reference in nfsd4_verify_deleg_dentry nfsd: fix use-after-free in nfsd_file_do_acquire tracepoint
2022-11-11Merge tag 'fixes_for_v6.1-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull UDF fix from Jan Kara: "Fix a possible memory corruption with UDF" * tag 'fixes_for_v6.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: udf: Fix a slab-out-of-bounds write bug in udf_find_entry()
2022-11-11cifs: Fix connections leak when tlink setup failedZhang Xiaoxu
If the tlink setup failed, lost to put the connections, then the module refcnt leak since the cifsd kthread not exit. Also leak the fscache info, and for next mount with fsc, it will print the follow errors: CIFS: Cache volume key already in use (cifs,127.0.0.1:445,TEST) Let's check the result of tlink setup, and do some cleanup. Fixes: 56c762eb9bee ("cifs: Refactor out cifs_mount()") Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-11-11f2fs: fix to set flush_merge opt and show noflush_mergeYangtao Li
Some minor modifications to flush_merge and related parameters: 1.The FLUSH_MERGE opt is set by default only in non-ro mode. 2.When ro and merge are set at the same time, an error is reported. 3.Display noflush_merge mount opt. Suggested-by: Chao Yu <chao@kernel.org> Signed-off-by: Yangtao Li <frank.li@vivo.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-11-11f2fs: initialize locks earlier in f2fs_fill_super()Tetsuo Handa
syzbot is reporting lockdep warning at f2fs_handle_error() [1], for spin_lock(&sbi->error_lock) is called before spin_lock_init() is called. For safe locking in error handling, move initialization of locks (and obvious structures) in f2fs_fill_super() to immediately after memory allocation. Link: https://syzkaller.appspot.com/bug?extid=40642be9b7e0bb28e0df [1] Reported-by: syzbot <syzbot+40642be9b7e0bb28e0df@syzkaller.appspotmail.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Tested-by: syzbot <syzbot+40642be9b7e0bb28e0df@syzkaller.appspotmail.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-11-11f2fs: optimize iteration over sparse directoriesChao Yu
Wei Chen reports a kernel bug as blew: INFO: task syz-executor.0:29056 blocked for more than 143 seconds. Not tainted 5.15.0-rc5 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:syz-executor.0 state:D stack:14632 pid:29056 ppid: 6574 flags:0x00000004 Call Trace: __schedule+0x4a1/0x1720 schedule+0x36/0xe0 rwsem_down_write_slowpath+0x322/0x7a0 fscrypt_ioctl_set_policy+0x11f/0x2a0 __f2fs_ioctl+0x1a9f/0x5780 f2fs_ioctl+0x89/0x3a0 __x64_sys_ioctl+0xe8/0x140 do_syscall_64+0x34/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xae Eric did some investigation on this issue, quoted from reply of Eric: "Well, the quality of this bug report has a lot to be desired (not on upstream kernel, reproducer is full of totally irrelevant stuff, not sent to the mailing list of the filesystem whose disk image is being fuzzed, etc.). But what is going on is that f2fs_empty_dir() doesn't consider the case of a directory with an extremely large i_size on a malicious disk image. Specifically, the reproducer mounts an f2fs image with a directory that has an i_size of 14814520042850357248, then calls FS_IOC_SET_ENCRYPTION_POLICY on it. That results in a call to f2fs_empty_dir() to check whether the directory is empty. f2fs_empty_dir() then iterates through all 3616826182336513 blocks the directory allegedly contains to check whether any contain anything. i_rwsem is held during this, so anything else that tries to take it will hang." In order to solve this issue, let's use f2fs_get_next_page_offset() to speed up iteration by skipping holes for all below functions: - f2fs_empty_dir - f2fs_readdir - find_in_level The way why we can speed up iteration was described in 'commit 3cf4574705b4 ("f2fs: introduce get_next_page_offset to speed up SEEK_DATA")'. Meanwhile, in f2fs_empty_dir(), let's use f2fs_find_data_page() instead f2fs_get_lock_data_page(), due to i_rwsem was held in caller of f2fs_empty_dir(), there shouldn't be any races, so it's fine to not lock dentry page during lookuping dirents in the page. Link: https://lore.kernel.org/lkml/536944df-a0ae-1dd8-148f-510b476e1347@kernel.org/T/ Reported-by: Wei Chen <harperchen1110@gmail.com> Cc: Eric Biggers <ebiggers@google.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-11-11f2fs: fix to avoid accessing uninitialized spinlockChao Yu
syzbot reports a kernel bug: __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x1e3/0x2cb lib/dump_stack.c:106 assign_lock_key+0x22a/0x240 kernel/locking/lockdep.c:981 register_lock_class+0x287/0x9b0 kernel/locking/lockdep.c:1294 __lock_acquire+0xe4/0x1f60 kernel/locking/lockdep.c:4934 lock_acquire+0x1a7/0x400 kernel/locking/lockdep.c:5668 __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline] _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:154 spin_lock include/linux/spinlock.h:350 [inline] f2fs_save_errors fs/f2fs/super.c:3868 [inline] f2fs_handle_error+0x29/0x230 fs/f2fs/super.c:3896 f2fs_iget+0x215/0x4bb0 fs/f2fs/inode.c:516 f2fs_fill_super+0x47d3/0x7b50 fs/f2fs/super.c:4222 mount_bdev+0x26c/0x3a0 fs/super.c:1401 legacy_get_tree+0xea/0x180 fs/fs_context.c:610 vfs_get_tree+0x88/0x270 fs/super.c:1531 do_new_mount+0x289/0xad0 fs/namespace.c:3040 do_mount fs/namespace.c:3383 [inline] __do_sys_mount fs/namespace.c:3591 [inline] __se_sys_mount+0x2e3/0x3d0 fs/namespace.c:3568 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd F2FS-fs (loop1): Failed to read F2FS meta data inode The root cause is if sbi->error_lock may be accessed before its initialization, fix it. Link: https://lore.kernel.org/linux-f2fs-devel/0000000000007edb6605ecbb6442@google.com/T/#u Reported-by: syzbot+40642be9b7e0bb28e0df@syzkaller.appspotmail.com Fixes: 95fa90c9e5a7 ("f2fs: support recording errors into superblock") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-11-11f2fs: correct i_size change for atomic writesDaeho Jeong
We need to make sure i_size doesn't change until atomic write commit is successful and restore it when commit is failed. Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-11-11f2fs: add proc entry to show discard_plist infoYangtao Li
This patch adds a new proc entry to show discard_plist information in more detail, which is very helpful to know the discard pend list count clearly. Such as: Discard pend list(Show diacrd_cmd count on each entry, .:not exist): 0 390 156 85 67 46 37 26 14 8 17 12 9 9 6 12 11 10 16 5 9 2 4 8 3 4 1 24 3 2 2 5 2 4 5 4 32 3 3 2 3 . 3 3 1 40 . 4 1 3 2 1 2 1 48 1 . 1 1 . 1 1 . 56 . 1 1 1 . 2 . 1 64 1 2 . . . . . . 72 . 1 . . . . . . 80 3 1 . . 1 1 . . 88 1 . . . 1 . . 1 ...... Signed-off-by: Yangtao Li <frank.li@vivo.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-11-11f2fs: allow to read node block after shutdownJaegeuk Kim
If block address is still alive, we should give a valid node block even after shutdown. Otherwise, we can see zero data when reading out a file. Cc: stable@vger.kernel.org Fixes: 83a3bfdb5a8a ("f2fs: indicate shutdown f2fs to allow unmount successfully") Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-11-11ext2: Fix some kernel-doc warningsBo Liu
The current code provokes some kernel-doc warnings: fs/ext2/dir.c:417: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst Signed-off-by: Bo Liu <liubo03@inspur.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jan Kara <jack@suse.cz>
2022-11-10jfs: makes diUnmount/diMount in jfs_mount_rw atomicOleg Kanatov
jfs_mount_rw can call diUnmount and then diMount. These calls change the imap pointer. Between these two calls there may be calls of function jfs_lookup(). The jfs_lookup() function calls jfs_iget(), which, in turn calls diRead(). The latter references the imap pointer. That may cause diRead() to refer to a pointer freed in diUnmount(). This commit makes the calls to diUnmount()/diMount() atomic so that nothing will read the imap pointer until the whole remount is completed. Signed-off-by: Oleg Kanatov <okanatov@gmail.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>