summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-04-17erofs: introduce on-disk format for long xattr name prefixesJingbo Xu
Besides the predefined xattr name prefixes, introduces long xattr name prefixes, which work similarly as the predefined name prefixes, except that they are user specified. It is especially useful for use cases together with overlayfs like Composefs model, which introduces diverse xattr values with only a few common xattr names (trusted.overlay.redirect, trusted.overlay.digest, and maybe more in the future). That makes the existing predefined prefixes ineffective in both image size and runtime performance. When a user specified long xattr name prefix is used, only the trailing part of the xattr name apart from the long xattr name prefix will be stored in erofs_xattr_entry.e_name. e_name is empty if the xattr name matches exactly as the long xattr name prefix. All long xattr prefixes are stored in the packed or meta inode, which depends if fragments feature is enabled or not. For each long xattr name prefix, the on-disk format is kept as the same as the unique metadata format: ALIGN({__le16 len, data}, 4), where len represents the total size of struct erofs_xattr_long_prefix, followed by data of struct erofs_xattr_long_prefix itself. Each erofs_xattr_long_prefix keeps predefined prefixes (base_index) and the remaining prefix string without the trailing '\0'. Two fields are introduced to the on-disk superblock, where xattr_prefix_count represents the total number of the long xattr name prefixes recorded, and xattr_prefix_start represents the start offset of recorded name prefixes in the packed/meta inode divided by 4. When referring to a long xattr name prefix, the highest bit (bit 7) of erofs_xattr_entry.e_name_index is set, while the lower bits (bit 0-6) as a whole represents the index of the referred long name prefix among all long xattr name prefixes. Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Acked-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20230407141710.113882-5-jefflexu@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-04-17erofs: move packed inode out of the compression partJingbo Xu
packed inode could be used in more scenarios which are independent of compression in the future. For example, packed inode could be used to keep extra long xattr prefixes with the help of following patches. Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Acked-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20230407141710.113882-4-jefflexu@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-04-17erofs: keep meta inode into erofs_bufGao Xiang
So that erofs_read_metadata() can read metadata from other inodes (e.g. packed inode) as well. Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Acked-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20230407141710.113882-2-jefflexu@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-04-17erofs: initialize packed inode after root inode is assignedJingbo Xu
As commit 8f7acdae2cd4 ("staging: erofs: kill all failure handling in fill_super()"), move the initialization of packed inode after root inode is assigned, so that the iput() in .put_super() is adequate as the failure handling. Otherwise, iput() is also needed in .kill_sb(), in case of the mounting fails halfway. Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Yue Hu <huyue2@coolpad.com> Fixes: b15b2e307c3a ("erofs: support on-disk compressed fragments data") Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Acked-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20230407141710.113882-3-jefflexu@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-04-17erofs: stop parsing non-compact HEAD index if clusterofs is invalidGao Xiang
Syzbot generated a crafted image [1] with a non-compact HEAD index of clusterofs 33024 while valid numbers should be 0 ~ lclustersize-1, which causes the following unexpected behavior as below: BUG: unable to handle page fault for address: fffff52101a3fff9 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 23ffed067 P4D 23ffed067 PUD 0 Oops: 0000 [#1] PREEMPT SMP KASAN CPU: 1 PID: 4398 Comm: kworker/u5:1 Not tainted 6.3.0-rc6-syzkaller-g09a9639e56c0 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/30/2023 Workqueue: erofs_worker z_erofs_decompressqueue_work RIP: 0010:z_erofs_decompress_queue+0xb7e/0x2b40 ... Call Trace: <TASK> z_erofs_decompressqueue_work+0x99/0xe0 process_one_work+0x8f6/0x1170 worker_thread+0xa63/0x1210 kthread+0x270/0x300 ret_from_fork+0x1f/0x30 Note that normal images or images using compact indexes are not impacted. Let's fix this now. [1] https://lore.kernel.org/r/000000000000ec75b005ee97fbaa@google.com Reported-and-tested-by: syzbot+aafb3f37cfeb6534c4ac@syzkaller.appspotmail.com Fixes: 02827e1796b3 ("staging: erofs: add erofs_map_blocks_iter") Fixes: 152a333a5895 ("staging: erofs: add compacted compression indexes support") Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20230410173714.104604-1-hsiangkao@linux.alibaba.com
2023-04-17erofs: don't warn ztailpacking feature anymoreYue Hu
The ztailpacking feature has been merged for a year, it has been mostly stable now. Signed-off-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20230227084457.3510-1-zbestahu@gmail.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-04-17erofs: simplify erofs_xattr_generic_get()Jingbo Xu
erofs_xattr_generic_get() won't be called from xattr handlers other than user/trusted/security xattr handler, and thus there's no need of extra checking. Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Link: https://lore.kernel.org/r/20230330082910.125374-4-jefflexu@linux.alibaba.com Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-04-17erofs: rename init_inode_xattrs with erofs_ prefixJingbo Xu
Rename init_inode_xattrs() to erofs_init_inode_xattrs() without logic change. Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20230330082910.125374-3-jefflexu@linux.alibaba.com Reviewed-by: Yue Hu <huyue2@coolpad.com> Acked-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-04-17erofs: move several xattr helpers into xattr.cJingbo Xu
Move xattrblock_addr() and xattrblock_offset() helpers into xattr.c, as they are not used outside of xattr.c. inlinexattr_header_size() has only one caller, and thus make it inlined into the caller directly. Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20230330082910.125374-2-jefflexu@linux.alibaba.com Reviewed-by: Yue Hu <huyue2@coolpad.com> Acked-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-04-17erofs: tidy up EROFS on-disk namingGao Xiang
- Get rid of all "vle" (variable-length extents) expressions since they only expand overall name lengths unnecessarily; - Rename COMPRESSION_LEGACY to COMPRESSED_FULL; - Move on-disk directory definitions ahead of compression; - Drop unused extended attribute definitions; - Move inode ondisk union `i_u` out as `union erofs_inode_i_u`. No actual logical change. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20230331063149.25611-1-hsiangkao@linux.alibaba.com
2023-04-17erofs: support flattened block device for multi-blob imagesJia Zhu
In order to support mounting multi-blobs container image as a single block device, add flattened block device feature for EROFS. In this mode, all meta/data contents will be mapped into one block space. User could compose a block device(by nbd/ublk/virtio-blk/ vhost-user-blk) from multiple sources and mount the block device by EROFS directly. It can reduce the number of block devices used, and it's also benefits in both VM file passthrough and distributed storage scenarios. You can test this using the method mentioned by: https://github.com/dragonflyoss/image-service/pull/1139 1. Compose a (nbd)block device from multi-blobs. 2. Mount EROFS on mntdir/. 3. Compare the md5sum between source dir and mntdir/. Later, we could also use it to refer original tar blobs. Signed-off-by: Jia Zhu <zhujia.zj@bytedance.com> Signed-off-by: Xin Yin <yinxin.x@bytedance.com> Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com> Acked-by: Chao Yu <chao@kernel.org> Tested-by: Jiang Liu <gerry@linux.alibaba.com> Link: https://lore.kernel.org/r/20230302071751.48425-1-zhujia.zj@bytedance.com [ Gao Xiang: refine commit message and use erofs_pos(). ] Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-04-17erofs: set block size to the on-disk block sizeJingbo Xu
Set the block size to that specified in on-disk superblock. Also remove the hard constraint of PAGE_SIZE block size for the uncompressed device backend. This constraint is temporarily remained for compressed device and fscache backend, as there is more work needed to handle the condition where the block size is not equal to PAGE_SIZE. It is worth noting that the on-disk block size is read prior to erofs_superblock_csum_verify(), as the read block size is needed in the latter. Besides, later we are going to make erofs refer to tar data blobs (which is 512-byte aligned) for OCI containers, where the block size is 512 bytes. In this case, the 512-byte block size may not be adequate for a directory to contain enough dirents. To fix this, we are also going to introduce directory block size independent on the block size. Due to we have already supported block size smaller than PAGE_SIZE now, disable all these images with such separated directory block size until we supported this feature later. Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20230313135309.75269-3-jefflexu@linux.alibaba.com [ Gao Xiang: update documentation. ] Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-04-17erofs: avoid hardcoded blocksize for subpage block supportJingbo Xu
As the first step of converting hardcoded blocksize to that specified in on-disk superblock, convert all call sites of hardcoded blocksize to sb->s_blocksize except for: 1) use sbi->blkszbits instead of sb->s_blocksize in erofs_superblock_csum_verify() since sb->s_blocksize has not been updated with the on-disk blocksize yet when the function is called. 2) use inode->i_blkbits instead of sb->s_blocksize in erofs_bread(), since the inode operated on may be an anonymous inode in fscache mode. Currently the anonymous inode is allocated from an anonymous mount maintained in erofs, while in the near future we may allocate anonymous inodes from a generic API directly and thus have no access to the anonymous inode's i_sb. Thus we keep the block size in i_blkbits for anonymous inodes in fscache mode. Be noted that this patch only gets rid of the hardcoded blocksize, in preparation for actually setting the on-disk block size in the following patch. The hard limit of constraining the block size to PAGE_SIZE still exists until the next patch. Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20230313135309.75269-2-jefflexu@linux.alibaba.com [ Gao Xiang: fold a patch to fix incorrect truncated offsets. ] Link: https://lore.kernel.org/r/20230413035734.15457-1-zhujia.zj@bytedance.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-04-16Merge tag 'powerpc-6.3-5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fix from Michael Ellerman: - A fix for NUMA distance handling in the pseries SCM (pmem) driver. Thanks to Aneesh Kumar K.V. * tag 'powerpc-6.3-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/papr_scm: Update the NUMA distance table for the target node
2023-04-16Merge tag 'kbuild-fixes-v6.3-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild fixes from Masahiro Yamada: - Drop debug info from purgatory objects again - Document that kernel.org provides prebuilt LLVM toolchains - Give up handling untracked files for source package builds - Avoid creating corrupted cpio when KBUILD_BUILD_TIMESTAMP is given with a pre-epoch data. - Change panic_show_mem() to a macro to handle variable-length argument - Compress tarballs on-the-fly again * tag 'kbuild-fixes-v6.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: kbuild: do not create intermediate *.tar for tar packages kbuild: do not create intermediate *.tar for source tarballs kbuild: merge cmd_archive_linux and cmd_archive_perf init/initramfs: Fix argument forwarding to panic() in panic_show_mem() initramfs: Check negative timestamp to prevent broken cpio archive kbuild: give up untracked files for source package builds Documentation/llvm: Add a note about prebuilt kernel.org toolchains purgatory: fix disabling debug info
2023-04-16Merge tag '6.3-rc6-ksmbd-server-fix' of git://git.samba.org/ksmbdLinus Torvalds
Pull ksmbd server fix from Steve French: "smb311 server preauth integrity negotiate context parsing fix (check for out of bounds access)" * tag '6.3-rc6-ksmbd-server-fix' of git://git.samba.org/ksmbd: ksmbd: avoid out of bounds access in decode_preauth_ctxt()
2023-04-16selftest, ptrace: Add selftest for syscall user dispatch config apiGregory Price
Validate that the following new ptrace requests work as expected * PTRACE_GET_SYSCALL_USER_DISPATCH_CONFIG returns the contents of task->syscall_dispatch * PTRACE_SET_SYSCALL_USER_DISPATCH_CONFIG sets the contents of task->syscall_dispatch Signed-off-by: Gregory Price <gregory.price@memverge.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20230407171834.3558-5-gregory.price@memverge.com
2023-04-16ptrace: Provide set/get interface for syscall user dispatchGregory Price
The syscall user dispatch configuration can only be set by the task itself, but lacks a ptrace set/get interface which makes it impossible to implement checkpoint/restore for it. Add the required ptrace requests and the get/set functions in the syscall user dispatch code to make that possible. Signed-off-by: Gregory Price <gregory.price@memverge.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Link: https://lore.kernel.org/r/20230407171834.3558-4-gregory.price@memverge.com
2023-04-16syscall_user_dispatch: Untag selector address before access_ok()Gregory Price
To support checkpoint/restart, ptrace must be able to set the selector of the tracee. The selector is a user pointer that may be subject to memory tagging extensions on some architectures (namely ARM MTE). access_ok() clears memory tags for tagged addresses if the current task has memory tagging enabled. This obviously fails when ptrace modifies the selector of a tracee when tracer and tracee do not have the same memory tagging enabled state. Solve this by untagging the selector address before handing it to access_ok(), like other ptrace functions which modify tracee pointers do. Obviously a tracer can set an invalid selector address for the tracee, but that's independent of tagging and a general capability of the tracer. Suggested-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Gregory Price <gregory.price@memverge.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Link: https://lore.kernel.org/all/ZCWXE04nLZ4pXEtM@arm.com/ Link: https://lore.kernel.org/r/20230407171834.3558-3-gregory.price@memverge.com
2023-04-16syscall_user_dispatch: Split up set_syscall_user_dispatch()Gregory Price
syscall user dispatch configuration is not covered by checkpoint/restore. To prepare for ptrace access to the syscall user dispatch configuration, move the inner working of set_syscall_user_dispatch() into a helper function. Make the helper function task pointer based and let set_syscall_user_dispatch() invoke it with task=current. No functional change. Signed-off-by: Gregory Price <gregory.price@memverge.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Link: https://lore.kernel.org/r/20230407171834.3558-2-gregory.price@memverge.com
2023-04-16PCI/MSI: Remove over-zealous hardware size check in pci_msix_validate_entries()Thomas Gleixner
pci_msix_validate_entries() validates the entries array which is handed in by the caller for a MSI-X interrupt allocation. Aside of consistency failures it also detects a failure when the size of the MSI-X hardware table in the device is smaller than the size of the entries array. That's wrong for the case of range allocations where the caller provides the minimum and the maximum number of vectors to allocate, when the hardware size is greater or equal than the mininum, but smaller than the maximum. Remove the hardware size check completely from that function and just ensure that the entires array up to the maximum size is consistent. The limitation and range checking versus the hardware size happens independently of that afterwards anyway because the entries array is optional. Fixes: 4644d22eb673 ("PCI/MSI: Validate MSI-X contiguous restriction early") Reported-by: David Laight <David.Laight@aculab.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/87v8i3sg62.ffs@tglx
2023-04-16kbuild: do not create intermediate *.tar for tar packagesMasahiro Yamada
Commit 05e96e96a315 ("kbuild: use git-archive for source package creation") split the compression as a separate step to factor out the common build rules. With the previous commit, we got back to the situation where source tarballs are compressed on-the-fly. There is no reason to keep the separate compression rules. Generate the comressed tar packages directly. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nathan Chancellor <nathan@kernel.org>
2023-04-16kbuild: do not create intermediate *.tar for source tarballsMasahiro Yamada
Since commit 05e96e96a315 ("kbuild: use git-archive for source package creation"), a source tarball is created in two steps; create *.tar file then compress it. I split the compression as a separate rule because I just thought 'git archive' supported only gzip. For other compression algorithms, I could pipe the two commands: $ git archive HEAD | xz > linux.tar.xz I read git-archive(1) carefully, and I realized GIT had provided a more elegant way: $ git -c tar.tar.xz.command=xz archive -o linux.tar.xz HEAD This commit uses 'tar.tar.*.command' configuration to specify the compression backend so we can compress a source tarball on-the-fly. GIT commit 767cf4579f0e ("archive: implement configurable tar filters") is more than a decade old, so it should be available on almost all build environments. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nathan Chancellor <nathan@kernel.org>
2023-04-16kbuild: merge cmd_archive_linux and cmd_archive_perfMasahiro Yamada
The two commands, cmd_archive_linux and cmd_archive_perf, are similar. Merge them to make it easier to add more changes to the git-archive command. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nathan Chancellor <nathan@kernel.org>
2023-04-16init/initramfs: Fix argument forwarding to panic() in panic_show_mem()Benjamin Gray
Forwarding variadic argument lists can't be done by passing a va_list to a function with signature foo(...) (as panic() has). It ends up interpreting the va_list itself as a single argument instead of iterating it. printf() happily accepts it of course, leading to corrupt output. Convert panic_show_mem() to a macro to allow forwarding the arguments. The function is trivial enough that it's easier than trying to introduce a vpanic() variant. Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2023-04-16initramfs: Check negative timestamp to prevent broken cpio archiveBenjamin Gray
Similar to commit 4c9d410f32b3 ("initramfs: Check timestamp to prevent broken cpio archive"), except asserts that the timestamp is non-negative. This can happen when the KBUILD_BUILD_TIMESTAMP is a value before UNIX epoch, which may be set when making reproducible builds that don't want to look like they use a valid date. While support for dates before 1970 might not be supported, this is more about preventing undetected CPIO corruption. The printf's use a minimum length format specifier, and will happily make the field longer than 8 characters if they need to. Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Tested-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2023-04-15Merge tag '6.3-rc6-smb311-client-negcontext-fix' of ↵Linus Torvalds
git://git.samba.org/sfrench/cifs-2.6 Pull cifs fix from Steve French: "Small client fix for better checking for smb311 negotiate context overflows, also marked for stable" * tag '6.3-rc6-smb311-client-negcontext-fix' of git://git.samba.org/sfrench/cifs-2.6: cifs: fix negotiate context parsing
2023-04-15Merge tag 'ubifs-for-linus-6.3-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs Pull UBI fixes from Richard Weinberger: - Fix failure to attach when vid_hdr offset equals the (sub)page size - Fix for a deadlock in UBI's worker thread * tag 'ubifs-for-linus-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs: ubi: Fix failure attaching when vid_hdr offset equals to (sub)page size ubi: Fix deadlock caused by recursively holding work_sem
2023-04-15cifs: fix negotiate context parsingDavid Disseldorp
smb311_decode_neg_context() doesn't properly check against SMB packet boundaries prior to accessing individual negotiate context entries. This is due to the length check omitting the eight byte smb2_neg_context header, as well as incorrect decrementing of len_of_ctxts. Fixes: 5100d8a3fe03 ("SMB311: Improve checking of negotiate security contexts") Reported-by: Volker Lendecke <vl@samba.org> Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: David Disseldorp <ddiss@suse.de> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-04-15debugobject: Prevent init race with static objectsThomas Gleixner
Statically initialized objects are usually not initialized via the init() function of the subsystem. They are special cased and the subsystem provides a function to validate whether an object which is not yet tracked by debugobjects is statically initialized. This means the object is started to be tracked on first use, e.g. activation. This works perfectly fine, unless there are two concurrent operations on that object. Schspa decoded the problem: T0 T1 debug_object_assert_init(addr) lock_hash_bucket() obj = lookup_object(addr); if (!obj) { unlock_hash_bucket(); - > preemption lock_subsytem_object(addr); activate_object(addr) lock_hash_bucket(); obj = lookup_object(addr); if (!obj) { unlock_hash_bucket(); if (is_static_object(addr)) init_and_track(addr); lock_hash_bucket(); obj = lookup_object(addr); obj->state = ACTIVATED; unlock_hash_bucket(); subsys function modifies content of addr, so static object detection does not longer work. unlock_subsytem_object(addr); if (is_static_object(addr)) <- Fails debugobject emits a warning and invokes the fixup function which reinitializes the already active object in the worst case. This race exists forever, but was never observed until mod_timer() got a debug_object_assert_init() added which is outside of the timer base lock held section right at the beginning of the function to cover the lockless early exit points too. Rework the code so that the lookup, the static object check and the tracking object association happens atomically under the hash bucket lock. This prevents the issue completely as all callers are serialized on the hash bucket lock and therefore cannot observe inconsistent state. Fixes: 3ac7fe5a4aab ("infrastructure to debug (dynamic) objects") Reported-by: syzbot+5093ba19745994288b53@syzkaller.appspotmail.com Debugged-by: Schspa Shi <schspa@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Link: https://syzkaller.appspot.com/bug?id=22c8a5938eab640d1c6bcc0e3dc7be519d878462 Link: https://lore.kernel.org/lkml/20230303161906.831686-1-schspa@gmail.com Link: https://lore.kernel.org/r/87zg7dzgao.ffs@tglx
2023-04-15Merge tag 'i2c-for-6.3-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: "Just two driver fixes" * tag 'i2c-for-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: ocores: generate stop condition after timeout in polling mode i2c: mchp-pci1xxxx: Update Timing registers
2023-04-15Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fix from James Bottomley: "One small fix to SCSI Enclosure Services to fix a regression caused by another recent fix" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: ses: Handle enclosure with just a primary component gracefully
2023-04-15Merge tag 'block-6.3-2023-04-14' of git://git.kernel.dk/linuxLinus Torvalds
Pull block fix from Jens Axboe: "A single NVMe quirk entry addition" * tag 'block-6.3-2023-04-14' of git://git.kernel.dk/linux: nvme-pci: add NVME_QUIRK_BOGUS_NID for T-FORCE Z330 SSD
2023-04-15Merge tag 'io_uring-6.3-2023-04-14' of git://git.kernel.dk/linuxLinus Torvalds
Pull io_uring fix from Jens Axboe: "Just a small tweak to when task_work needs redirection, marked for stable as well" * tag 'io_uring-6.3-2023-04-14' of git://git.kernel.dk/linux: io_uring: complete request via task work in case of DEFER_TASKRUN
2023-04-15genirq: Update affinity of secondary threadsJohn Keeping
For interrupts with secondary threads, the affinity is applied when the thread is created but if the interrupts affinity is changed later only the primary thread is updated. Update the secondary thread's affinity as well to keep all the interrupts activity on the assigned CPUs. Signed-off-by: John Keeping <john@metanate.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20230406180857.588682-1-john@metanate.com
2023-04-15softirq: Add trace points for tasklet entry/exitLingutla Chandrasekhar
Tasklets are supposed to finish their work quickly and should not block the current running process, but it is not guaranteed that they do so. Currently softirq_entry/exit can be used to analyse the total tasklets execution time, but that's not helpful to track individual tasklets execution time. That makes it hard to identify tasklet functions, which take more time than expected. Add tasklet_entry/exit trace point support to track individual tasklet execution. Trivial usage example: # echo 1 > /sys/kernel/debug/tracing/events/irq/tasklet_entry/enable # echo 1 > /sys/kernel/debug/tracing/events/irq/tasklet_exit/enable # cat /sys/kernel/debug/tracing/trace # tracer: nop # # entries-in-buffer/entries-written: 4/4 #P:4 # # _-----=> irqs-off/BH-disabled # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / _-=> migrate-disable # |||| / delay # TASK-PID CPU# ||||| TIMESTAMP FUNCTION # | | | ||||| | | <idle>-0 [003] ..s1. 314.011428: tasklet_entry: tasklet=0xffffa01ef8db2740 function=tcp_tasklet_func <idle>-0 [003] ..s1. 314.011432: tasklet_exit: tasklet=0xffffa01ef8db2740 function=tcp_tasklet_func <idle>-0 [003] ..s1. 314.017369: tasklet_entry: tasklet=0xffffa01ef8db2740 function=tcp_tasklet_func <idle>-0 [003] ..s1. 314.017371: tasklet_exit: tasklet=0xffffa01ef8db2740 function=tcp_tasklet_func Signed-off-by: Lingutla Chandrasekhar <clingutla@codeaurora.org> Signed-off-by: J. Avila <elavila@google.com> Signed-off-by: John Stultz <jstultz@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20230407230526.1685443-1-jstultz@google.com [elavila: Port to android-mainline] [jstultz: Rebased to upstream, cut unused trace points, added comments for the tracepoints, reworded commit]
2023-04-14Merge tag 'riscv-for-linus-6.3-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Palmer Dabbelt: - A fix for a missing fence when generating the NOMMU sigreturn trampoline - A set of fixes for early DTB handling of reserved memory nodes * tag 'riscv-for-linus-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: No need to relocate the dtb as it lies in the fixmap region riscv: Do not set initial_boot_params to the linear address of the dtb riscv: Move early dtb mapping into the fixmap region riscv: add icache flush for nommu sigreturn trampoline
2023-04-14Merge tag 'acpi-6.3-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fixes from Rafael Wysocki: "These add two ACPI-related quirks: - Add a quirk to force StorageD3Enable on AMD Picasso systems (Mario Limonciello) - Add an ACPI IRQ override quirk for ASUS ExpertBook B1502CBA (Paul Menzel)" * tag 'acpi-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: resource: Skip IRQ override on ASUS ExpertBook B1502CBA ACPI: x86: utils: Add Picasso to the list for forcing StorageD3Enable
2023-04-14Merge tag 'pm-6.3-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fix from Rafael Wysocki: "Make the amd-pstate cpufreq driver take all of the possible combinations of the 'old' and 'new' status values correctly while changing the operation mode via sysfs (Wyes Karny)" * tag 'pm-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: amd-pstate: Fix amd_pstate mode switch
2023-04-14Merge tag 'thermal-6.3-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull thermal control fix from Rafael Wysocki: "Modify the Intel thermal throttling code to avoid updating unsupported status clearing mask bits which causes the kernel to complain about unchecked MSR access (Srinivas Pandruvada)" * tag 'thermal-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: thermal: intel: Avoid updating unsupported THERM_STATUS_CLEAR mask bits
2023-04-14Merge tag 'sound-6.3-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "A collection of small fixes. At this time, quite a few fixes for the old PCI drivers are found. Although they are not regression fixes, I took these as they are materials for stable kernels. In addition, a couple of regression fixes and another couple of HD-audio quirks are included" * tag 'sound-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: hda/hdmi: disable KAE for Intel DG2 ALSA: hda/realtek: Add quirks for Lenovo Z13/Z16 Gen2 ALSA: hda: patch_realtek: add quirk for Asus N7601ZM ALSA: firewire-tascam: add missing unwind goto in snd_tscm_stream_start_duplex() ALSA: emu10k1: don't create old pass-through playback device on Audigy ALSA: emu10k1: fix capture interrupt handler unlinking ALSA: hda/sigmatel: fix S/PDIF out on Intel D*45* motherboards ALSA: hda/sigmatel: add pin overrides for Intel DP45SG motherboard ALSA: i2c/cs8427: fix iec958 mixer control deactivation
2023-04-14selftests/resctrl: Fix incorrect error return on test completeReinette Chatre
An error snuck in between two recent conflicting changes: Until recently ->setup() used negative values to indicate normal test termination. This was changed in commit fa10366cc6f4 ("selftests/resctrl: Allow ->setup() to return errors") that transitioned ->setup() to use negative values to indicate errors and a new END_OF_TESTS to indicate normal termination. commit 42e3b093eb7c ("selftests/resctrl: Fix set up schemata with 100% allocation on first run in MBM test") continued to use negative return to indicate normal test termination. Fix mbm_setup() to use the new END_OF_TESTS to indicate error-free test termination. Fixes: 42e3b093eb7c ("selftests/resctrl: Fix set up schemata with 100% allocation on first run in MBM test") Reported-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://lore.kernel.org/lkml/bb65cce8-54d7-68c5-ef19-3364ec95392a@linux.intel.com/ Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2023-04-14Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma fixes from Jason Gunthorpe: "We had a fairly slow cycle on the rc side this time, here are the accumulated fixes, mostly in drivers: - irdma should not generate extra completions during flushing - Fix several memory leaks - Do not get confused in irdma's iwarp mode if IPv6 is present - Correct a link speed calculation in mlx5 - Increase the EQ/WQ limits on erdma as they are too small for big applications - Use the right math for erdma's inline mtt feature - Make erdma probing more robust to boot time ordering differences - Fix a KMSAN crash in CMA due to uninitialized qkey" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/core: Fix GID entry ref leak when create_ah fails RDMA/cma: Allow UD qp_type to join multicast only RDMA/erdma: Defer probing if netdevice can not be found RDMA/erdma: Inline mtt entries into WQE if supported RDMA/erdma: Update default EQ depth to 4096 and max_send_wr to 8192 RDMA/erdma: Fix some typos IB/mlx5: Add support for 400G_8X lane speed RDMA/irdma: Add ipv4 check to irdma_find_listener() RDMA/irdma: Increase iWARP CM default rexmit count RDMA/irdma: Fix memory leak of PBLE objects RDMA/irdma: Do not generate SW completions for NOPs
2023-04-14s390/bpf: Fix bpf_arch_text_poke() with new_addr == NULLIlya Leoshkevich
Thomas Richter reported a crash in linux-next with a backtrace similar to the following one: [<0000000000000000>] 0x0 ([<000000000031a182>] bpf_trace_run4+0xc2/0x218) [<00000000001d59f4>] __bpf_trace_sched_switch+0x1c/0x28 [<0000000000c44a3a>] __schedule+0x43a/0x890 [<0000000000c44ef8>] schedule+0x68/0x110 [<0000000000c4e5ca>] do_nanosleep+0xa2/0x168 [<000000000026e7fe>] hrtimer_nanosleep+0xf6/0x1c0 [<000000000026eb6e>] __s390x_sys_nanosleep+0xb6/0xf0 [<0000000000c3b81c>] __do_syscall+0x1e4/0x208 [<0000000000c50510>] system_call+0x70/0x98 Last Breaking-Event-Address: [<000003ff7fda1814>] bpf_prog_65e887c70a835bbf_on_switch+0x1a4/0x1f0 The problem is that bpf_arch_text_poke() with new_addr == NULL is susceptible to the following race condition: T1 T2 ----------------- ------------------- plt.target = NULL entry: brcl 0xf,plt entry.mask = 0 lgrl %r1,plt.target br %r1 Fix by setting PLT target to the instruction following `brcl 0xf,plt` instead of 0. This way T2 will simply resume the execution of the eBPF program, which is the desired effect of passing new_addr == NULL. Fixes: f1d5df84cd8c ("s390/bpf: Implement bpf_arch_text_poke()") Reported-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Link: https://lore.kernel.org/bpf/20230414154755.184502-1-iii@linux.ibm.com
2023-04-14Merge branch 'acpi-x86'Rafael J. Wysocki
Merge a quirk to force StorageD3Enable on AMD Picasso systems (Mario Limonciello). * acpi-x86: ACPI: x86: utils: Add Picasso to the list for forcing StorageD3Enable
2023-04-14io_uring: complete request via task work in case of DEFER_TASKRUNMing Lei
So far io_req_complete_post() only covers DEFER_TASKRUN by completing request via task work when the request is completed from IOWQ. However, uring command could be completed from any context, and if io uring is setup with DEFER_TASKRUN, the command is required to be completed from current context, otherwise wait on IORING_ENTER_GETEVENTS can't be wakeup, and may hang forever. The issue can be observed on removing ublk device, but turns out it is one generic issue for uring command & DEFER_TASKRUN, so solve it in io_uring core code. Fixes: e6aeb2721d3b ("io_uring: complete all requests in task context") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/linux-block/b3fc9991-4c53-9218-a8cc-5b4dd3952108@kernel.dk/ Reported-by: Jens Axboe <axboe@kernel.dk> Cc: Kanchan Joshi <joshi.k@samsung.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-14Merge branch 'nvme-6.3' of git://git.infradead.org/nvme into block-6.3Jens Axboe
Pull NVMe fix from Christoph. * 'nvme-6.3' of git://git.infradead.org/nvme: nvme-pci: add NVME_QUIRK_BOGUS_NID for T-FORCE Z330 SSD
2023-04-14Merge tag 'qcom-arm64-fixes-for-6.3-2' of ↵Arnd Bergmann
https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into arm/fixes A few more Qualcomm ARM64 DeviceTree fixes for 6.3 The GPIO polarity of the WSA881x shutdown GPIO was inconsistent and had to be corrected in the driver, this fixes the polarity in the DeviceTree for QRB5165 RB5, SM8250 MTP, Samsung Galaxy Book 2 and Lenovo Yoga C630. The recent rearrangement of nodes among the IPQ8074 accidentally enabled the PCIe PHYs, rather than the PCIe controllers. This is being corrected, to restore PCIe functionality. PMK8280 PON node has the wrong compatible, which recently caused the driver to stop probing. This is corrected and the required "pbs" region is added. With support for HBR3 introduced, it's noted that SC7280 Herobrine devices are having trouble running at this rate. This drops the claim that it's supported, until further analysis can be done. * tag 'qcom-arm64-fixes-for-6.3-2' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux: arm64: dts: qcom: sc7280: remove hbr3 support on herobrine boards arm64: dts: qcom: sc8280xp-pmics: fix pon compatible and registers arm64: dts: qcom: ipq8074-hk10: enable QMP device, not the PHY node arm64: dts: qcom: ipq8074-hk01: enable QMP device, not the PHY node arm64: dts: qcom: qrb5165-rb5: Use proper WSA881x shutdown GPIO polarity arm64: dts: qcom: sm8250-mtp: Use proper WSA881x shutdown GPIO polarity arm64: dts: qcom: sdm850-samsung-w737: Use proper WSA881x shutdown GPIO polarity arm64: dts: qcom: sdm850-lenovo-yoga-c630: Use proper WSA881x shutdown GPIO polarity Link: https://lore.kernel.org/r/20230410153850.4752-1-andersson@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-04-14Merge tag 'v6.3-rockchip-dtsfixes1' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into arm/fixes Lower sd card speeds for two boards to make them run more reliable, missing 32k clock definition for Anbric xx3 devices, missing cache-levels for rk3588, fixed rk3326-board display supplies and more dt-schema fixes. * tag 'v6.3-rockchip-dtsfixes1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip: arm64: dts: rockchip: correct panel supplies on some rk3326 boards arm64: dts: rockchip: use just "port" in panel on RockPro64 arm64: dts: rockchip: use just "port" in panel on Pinebook Pro arm64: dts: rockchip: Remove non-existing pwm-delay-us property arm64: dts: rockchip: Add clk_rtc_32k to Anbernic xx3 Devices arm64: dts: rockchip: add rk3588 cache level information arm64: dts: rockchip: Lower SD card speed on rk3399 Pinebook Pro arm64: dts: rockchip: Lower sd speed on rk3566-soquartz ARM: dts: rockchip: fix a typo error for rk3288 spdif node arm64: dts: rockchip: Fix rk3399 GICv3 ITS node name Link: https://lore.kernel.org/r/10559306.CDJkKcVGEf@phil Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-04-14firmware/psci: demote suspend-mode warning to info levelJohan Hovold
On some Qualcomm platforms, like SC8280XP, the attempt to set PC mode during boot fails with PSCI_RET_DENIED and since commit 998fcd001feb ("firmware/psci: Print a warning if PSCI doesn't accept PC mode") this is now logged at warning level: psci: failed to set PC mode: -3 As there is nothing users can do about the firmware behaving this way, demote the warning to info level and clearly mark it as a firmware bug: psci: [Firmware Bug]: failed to set PC mode: -3 Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Acked-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>