summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2024-06-15kcov: don't lose track of remote references during softirqsAleksandr Nogikh
In kcov_remote_start()/kcov_remote_stop(), we swap the previous KCOV metadata of the current task into a per-CPU variable. However, the kcov_mode_enabled(mode) check is not sufficient in the case of remote KCOV coverage: current->kcov_mode always remains KCOV_MODE_DISABLED for remote KCOV objects. If the original task that has invoked the KCOV_REMOTE_ENABLE ioctl happens to get interrupted and kcov_remote_start() is called, it ultimately leads to kcov_remote_stop() NOT restoring the original KCOV reference. So when the task exits, all registered remote KCOV handles remain active forever. The most uncomfortable effect (at least for syzkaller) is that the bug prevents the reuse of the same /sys/kernel/debug/kcov descriptor. If we obtain it in the parent process and then e.g. drop some capabilities and continuously fork to execute individual programs, at some point current->kcov of the forked process is lost, kcov_task_exit() takes no action, and all KCOV_REMOTE_ENABLE ioctls calls from subsequent forks fail. And, yes, the efficiency is also affected if we keep on losing remote kcov objects. a) kcov_remote_map keeps on growing forever. b) (If I'm not mistaken), we're also not freeing the memory referenced by kcov->area. Fix it by introducing a special kcov_mode that is assigned to the task that owns a KCOV remote object. It makes kcov_mode_enabled() return true and yet does not trigger coverage collection in __sanitizer_cov_trace_pc() and write_comp_data(). [nogikh@google.com: replace WRITE_ONCE() with an ordinary assignment] Link: https://lkml.kernel.org/r/20240614171221.2837584-1-nogikh@google.com Link: https://lkml.kernel.org/r/20240611133229.527822-1-nogikh@google.com Fixes: 5ff3b30ab57d ("kcov: collect coverage from interrupts") Signed-off-by: Aleksandr Nogikh <nogikh@google.com> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Tested-by: Andrey Konovalov <andreyknvl@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Marco Elver <elver@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-15mm: huge_memory: fix misused mapping_large_folio_support() for anon foliosRan Xiaokai
When I did a large folios split test, a WARNING "[ 5059.122759][ T166] Cannot split file folio to non-0 order" was triggered. But the test cases are only for anonmous folios. while mapping_large_folio_support() is only reasonable for page cache folios. In split_huge_page_to_list_to_order(), the folio passed to mapping_large_folio_support() maybe anonmous folio. The folio_test_anon() check is missing. So the split of the anonmous THP is failed. This is also the same for shmem_mapping(). We'd better add a check for both. But the shmem_mapping() in __split_huge_page() is not involved, as for anonmous folios, the end parameter is set to -1, so (head[i].index >= end) is always false. shmem_mapping() is not called. Also add a VM_WARN_ON_ONCE() in mapping_large_folio_support() for anon mapping, So we can detect the wrong use more easily. THP folios maybe exist in the pagecache even the file system doesn't support large folio, it is because when CONFIG_TRANSPARENT_HUGEPAGE is enabled, khugepaged will try to collapse read-only file-backed pages to THP. But the mapping does not actually support multi order large folios properly. Using /sys/kernel/debug/split_huge_pages to verify this, with this patch, large anon THP is successfully split and the warning is ceased. Link: https://lkml.kernel.org/r/202406071740485174hcFl7jRxncsHDtI-Pz-o@zte.com.cn Fixes: c010d47f107f ("mm: thp: split huge page to any lower order pages") Reviewed-by: Barry Song <baohua@kernel.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Signed-off-by: Ran Xiaokai <ran.xiaokai@zte.com.cn> Cc: Michal Hocko <mhocko@kernel.org> Cc: xu xin <xu.xin16@zte.com.cn> Cc: Yang Yang <yang.yang29@zte.com.cn> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-15lib/alloc_tag: fix RCU imbalance in pgalloc_tag_get()Suren Baghdasaryan
put_page_tag_ref() should be called only when get_page_tag_ref() returns a valid reference because only in that case get_page_tag_ref() enters RCU read section while put_page_tag_ref() will call rcu_read_unlock() even if the provided reference is NULL. Fix pgalloc_tag_get() which does not follow this rule causing RCU imbalance. Add a warning in put_page_tag_ref() to catch any future mistakes. Link: https://lkml.kernel.org/r/20240601233840.617458-1-surenb@google.com Fixes: cc92eba1c88b ("mm: fix non-compound multi-order memory accounting in __free_pages") Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202405271029.6d2f9c4c-lkp@intel.com Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Kees Cook <keescook@chromium.org> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-15Revert "mm: init_mlocked_on_free_v3"David Hildenbrand
There was insufficient review and no agreement that this is the right approach. There are serious flaws with the implementation that make processes using mlock() not even work with simple fork() [1] and we get reliable crashes when rebooting. Further, simply because we might be unmapping a single PTE of a large mlocked folio, we shouldn't zero out the whole folio. ... especially because the code can also *corrupt* urelated memory because kernel_init_pages(page, folio_nr_pages(folio)); Could end up writing outside of the actual folio if we work with a tail page. Let's revert it. Once there is agreement that this is the right approach, the issues were fixed and there was reasonable review and proper testing, we can consider it again. [1] https://lkml.kernel.org/r/4da9da2f-73e4-45fd-b62f-a8a513314057@redhat.com Link: https://lkml.kernel.org/r/20240605091710.38961-1-david@redhat.com Fixes: ba42b524a040 ("mm: init_mlocked_on_free_v3") Signed-off-by: David Hildenbrand <david@redhat.com> Reported-by: David Wang <00107082@163.com> Closes: https://lore.kernel.org/lkml/20240528151340.4282-1-00107082@163.com/ Reported-by: Lance Yang <ioworker0@gmail.com> Closes: https://lkml.kernel.org/r/20240601140917.43562-1-ioworker0@gmail.com Acked-by: Lance Yang <ioworker0@gmail.com> Cc: York Jasper Niebuhr <yjnworkstation@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-15ASoC: codecs: lpass: add support for v2.5 rx macroMark Brown
Merge series from Srinivas Kandagatla <srinivas.kandagatla@linaro.org>: This patchset adds support to reading codec version and also adds support for v2.5 codec version in rx macro. LPASS 2.5 and up versions have changes in some of the rx blocks which are required to get headset functional correctly. Tested this on SM8450, X13s and x1e80100 crd. This changes also fixes issue with sm8450, sm8550, sm8660 and x1e80100.
2024-06-15firewire: core: record card index in bus_reset_handle tracepoints eventTakashi Sakamoto
The bus reset event occurs in the bus managed by one of 1394 OHCI controller in Linux system, however the existing tracepoints events has the lack of data about it to distinguish the issued hardware from the others. This commit adds card_index member into event structure to store the index of host controller in use, and prints it. Link: https://lore.kernel.org/r/20240613131440.431766-9-o-takashi@sakamocchi.jp Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-06-15firewire: core: record card index in tracepoinrts events derived from ↵Takashi Sakamoto
bus_reset_arrange_template The asynchronous transmission of phy packet is initiated on one of 1394 OHCI controller, however the existing tracepoints events has the lack of data about it. This commit adds card_index member into event structure to store the index of host controller in use, and prints it. Link: https://lore.kernel.org/r/20240613131440.431766-8-o-takashi@sakamocchi.jp Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-06-15firewire: core: record card index in async_phy_inbound tracepoints eventTakashi Sakamoto
The asynchronous transmission of phy packet is initiated on one of 1394 OHCI controller, however the existing tracepoints events has the lack of data about it. This commit adds card_index member into event structure to store the index of host controller in use, and prints it. Link: https://lore.kernel.org/r/20240613131440.431766-7-o-takashi@sakamocchi.jp Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-06-15firewire: core: record card index in async_phy_outbound_complete tracepoints ↵Takashi Sakamoto
event The asynchronous transmission of phy packet is initiated on one of 1394 OHCI controller, however the existing tracepoints events has the lack of data about it. This commit adds card_index member into event structure to store the index of host controller in use, and prints it. Link: https://lore.kernel.org/r/20240613131440.431766-6-o-takashi@sakamocchi.jp Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-06-15firewire: core: record card index in async_phy_outbound_initiate tracepoints ↵Takashi Sakamoto
event The asynchronous transaction is initiated on one of 1394 OHCI controller, however the existing tracepoints events has the lack of data about it. This commit adds card_index member into event structure to store the index of host controller in use, and prints it. Link: https://lore.kernel.org/r/20240613131440.431766-5-o-takashi@sakamocchi.jp Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-06-15firewire: core: record card index in tracepoinrts events derived from ↵Takashi Sakamoto
async_inbound_template The asynchronous transaction is initiated on one of 1394 OHCI controller, however the existing tracepoints events has the lack of data about it. This commit adds card_index member into event structure to store the index of host controller in use, and prints it. Link: https://lore.kernel.org/r/20240613131440.431766-4-o-takashi@sakamocchi.jp Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-06-15firewire: core: record card index in tracepoinrts events derived from ↵Takashi Sakamoto
async_outbound_initiate_template The asynchronous transaction is initiated on one of 1394 OHCI controller, however the existing tracepoints events has the lack of data about it. This commit adds card_index member into event structure to store the index of host controller in use, and prints it. Link: https://lore.kernel.org/r/20240613131440.431766-3-o-takashi@sakamocchi.jp Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-06-15firewire: core: record card index in tracepoinrts events derived from ↵Takashi Sakamoto
async_outbound_complete_template The asynchronous transaction is initiated on one of 1394 OHCI controller, however the existing tracepoints events has the lack of data about it. This commit adds card_index member into event structure to store the index of host controller in use, and prints it. Link: https://lore.kernel.org/r/20240613131440.431766-2-o-takashi@sakamocchi.jp Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-06-14net: stmmac: add select_pcs() platform methodRussell King (Oracle)
Allow platform drivers to provide their logic to select an appropriate PCS. Tested-by: Romain Gantois <romain.gantois@bootlin.com> Reviewed-by: Romain Gantois <romain.gantois@bootlin.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/E1sHhoM-00Fesu-8E@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-14net/mlx5e: Support SWP-mode offload L4 csum calculationRahul Rameshbabu
Calculate the pseudo-header checksum for both IPSec transport mode and IPSec tunnel mode for mlx5 devices that do not implement a pure hardware checksum offload for L4 checksum calculation. Introduce a capability bit that identifies such mlx5 devices. Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240613210036.1125203-7-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-14net/mlx5: Correct TASR typo into TSARCosmin Ratiu
TSAR is the correct spelling (Transmit Scheduling ARbiter). Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240613210036.1125203-2-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-14Merge tag 'vfio-v6.10-rc4' of https://github.com/awilliam/linux-vfioLinus Torvalds
Pull VFIO fixes from Alex Williamson: "Fix long standing lockdep issue of using remap_pfn_range() from the vfio-pci fault handler for mapping device MMIO. Commit ba168b52bf8e ("mm: use rwsem assertion macros for mmap_lock") now exposes this as a warning forcing this to be addressed. remap_pfn_range() was used here to efficiently map the entire vma, but it really never should have been used in the fault handler and doesn't handle concurrency, which introduced complex locking. We also needed to track vmas mapping the device memory in order to zap those vmas when the memory is disabled resulting in a vma list. Instead of all that mess, setup an address space on the device fd such that we can use unmap_mapping_range() for zapping to avoid the tracking overhead and use the standard vmf_insert_pfn() to insert mappings on fault. For now we'll iterate the vma and opportunistically try to insert mappings for the entire vma. This aligns with typical use cases, but hopefully in the future we can drop the iterative approach and make use of huge_fault instead, once vmf_insert_pfn{pud,pmd}() learn to handle pfnmaps" * tag 'vfio-v6.10-rc4' of https://github.com/awilliam/linux-vfio: vfio/pci: Insert full vma on mmap'd MMIO fault vfio/pci: Use unmap_mapping_range() vfio: Create vfio_fs_type with inode per device
2024-06-14kunit: test: Add vm_mmap() allocation resource managerKees Cook
For tests that need to allocate using vm_mmap() (e.g. usercopy and execve), provide the interface to have the allocation tracked by KUnit itself. This requires bringing up a placeholder userspace mm. This combines my earlier attempt at this with Mark Rutland's version[1]. Normally alloc_mm() and arch_pick_mmap_layout() aren't exported for modules, so export these only for KUnit testing. Link: https://lore.kernel.org/lkml/20230321122514.1743889-2-mark.rutland@arm.com/ [1] Co-developed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: David Gow <davidgow@google.com> Signed-off-by: Kees Cook <kees@kernel.org> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-06-14Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2024-06-14 We've added 8 non-merge commits during the last 2 day(s) which contain a total of 9 files changed, 92 insertions(+), 11 deletions(-). The main changes are: 1) Silence a syzkaller splat under CONFIG_DEBUG_NET=y in pskb_pull_reason() triggered via __bpf_try_make_writable(), from Florian Westphal. 2) Fix removal of kfuncs during linking phase which then throws a kernel build warning via resolve_btfids about unresolved symbols, from Tony Ambardar. 3) Fix a UML x86_64 compilation failure from BPF as pcpu_hot symbol is not available on User Mode Linux, from Maciej Żenczykowski. 4) Fix a register corruption in reg_set_min_max triggering an invariant violation in BPF verifier, from Daniel Borkmann. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf: Harden __bpf_kfunc tag against linker kfunc removal compiler_types.h: Define __retain for __attribute__((__retain__)) bpf: Avoid splat in pskb_pull_reason bpf: fix UML x86_64 compile failure selftests/bpf: Add test coverage for reg_set_min_max handling bpf: Reduce stack consumption in check_stack_write_fixed_off bpf: Fix reg_set_min_max corruption of fake_reg MAINTAINERS: mailmap: Update Stanislav's email address ==================== Link: https://lore.kernel.org/r/20240614203223.26500-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-14KVM: arm64: Add support for FFA_PARTITION_INFO_GETSebastian Ene
Handle the FFA_PARTITION_INFO_GET host call inside the pKVM hypervisor and copy the response message back to the host buffers. Signed-off-by: Sebastian Ene <sebastianene@google.com> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Tested-by: Sudeep Holla <sudeep.holla@arm.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240613132035.1070360-3-sebastianene@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-14bpf: Track delta between "linked" registers.Alexei Starovoitov
Compilers can generate the code r1 = r2 r1 += 0x1 if r2 < 1000 goto ... use knowledge of r2 range in subsequent r1 operations So remember constant delta between r2 and r1 and update r1 after 'if' condition. Unfortunately LLVM still uses this pattern for loops with 'can_loop' construct: for (i = 0; i < 1000 && can_loop; i++) The "undo" pass was introduced in LLVM https://reviews.llvm.org/D121937 to prevent this optimization, but it cannot cover all cases. Instead of fighting middle end optimizer in BPF backend teach the verifier about this pattern. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/20240613013815.953-3-alexei.starovoitov@gmail.com
2024-06-14Merge tag 'block-6.10-20240614' of git://git.kernel.dk/linuxLinus Torvalds
Pull block fixes from Jens Axboe: - NVMe pull request via Keith: - Discard double free on error conditions (Chunguang) - Target Fixes (Daniel) - Namespace detachment regression fix (Keith) - Fix for an issue with flush requests and queuelist reuse (Chengming) - nbd sparse annotation fixes (Christoph) - unmap and free bio mapped data via submitter (Anuj) - loop discard/fallocate unsupported fix (Cyril) - Fix for the zoned write plugging added in this release (Damien) - sed-opal wrong address fix (Su) * tag 'block-6.10-20240614' of git://git.kernel.dk/linux: loop: Disable fallocate() zero and discard if not supported nvme: fix namespace removal list nbd: Remove __force casts nvmet: always initialize cqe.result nvmet-passthru: propagate status from id override functions nvme: avoid double free special payload block: unmap and free user mapped integrity via submitter block: fix request.queuelist usage in flush block: Optimize disk zone resource cleanup block: sed-opal: avoid possible wrong address reference in read_sed_opal_key()
2024-06-14Merge tag 'io_uring-6.10-20240614' of git://git.kernel.dk/linuxLinus Torvalds
Pull io_uring fixes from Jens Axboe: "Two fixes from Pavel headed to stable: - Ensure that the task state is correct before attempting to grab a mutex - Split cancel sequence flag into a separate variable, as it can get set by someone not owning the request (but holding the ctx lock)" * tag 'io_uring-6.10-20240614' of git://git.kernel.dk/linux: io_uring: fix cancellation overwriting req->flags io_uring/rsrc: don't lock while !TASK_RUNNING
2024-06-14Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Three obvious driver fixes and two core fixes. The two core fixes are to disable Command Duration Limits by default to fix an inconsistency in SATA and some USB devices. The other is to change the default read size for block zero to follow the device preference (some USB bridges preferring 16 byte commands don't have a translation for READ(10) and thus don't scan properly)" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: mpi3mr: Fix ATA NCQ priority support scsi: ufs: core: Quiesce request queues before checking pending cmds scsi: core: Disable CDL by default scsi: mpt3sas: Avoid test/set_bit() operating in non-allocated memory scsi: sd: Use READ(16) when reading block zero on large capacity disks
2024-06-14bpf: Harden __bpf_kfunc tag against linker kfunc removalTony Ambardar
BPF kfuncs are often not directly referenced and may be inadvertently removed by optimization steps during kernel builds, thus the __bpf_kfunc tag mitigates against this removal by including the __used macro. However, this macro alone does not prevent removal during linking, and may still yield build warnings (e.g. on mips64el): [...] LD vmlinux BTFIDS vmlinux WARN: resolve_btfids: unresolved symbol bpf_verify_pkcs7_signature WARN: resolve_btfids: unresolved symbol bpf_lookup_user_key WARN: resolve_btfids: unresolved symbol bpf_lookup_system_key WARN: resolve_btfids: unresolved symbol bpf_key_put WARN: resolve_btfids: unresolved symbol bpf_iter_task_next WARN: resolve_btfids: unresolved symbol bpf_iter_css_task_new WARN: resolve_btfids: unresolved symbol bpf_get_file_xattr WARN: resolve_btfids: unresolved symbol bpf_ct_insert_entry WARN: resolve_btfids: unresolved symbol bpf_cgroup_release WARN: resolve_btfids: unresolved symbol bpf_cgroup_from_id WARN: resolve_btfids: unresolved symbol bpf_cgroup_acquire WARN: resolve_btfids: unresolved symbol bpf_arena_free_pages NM System.map SORTTAB vmlinux OBJCOPY vmlinux.32 [...] Update the __bpf_kfunc tag to better guard against linker optimization by including the new __retain compiler macro, which fixes the warnings above. Verify the __retain macro with readelf by checking object flags for 'R': $ readelf -Wa kernel/trace/bpf_trace.o Section Headers: [Nr] Name Type Address Off Size ES Flg Lk Inf Al [...] [178] .text.bpf_key_put PROGBITS 00000000 6420 0050 00 AXR 0 0 8 [...] Key to Flags: [...] R (retain), D (mbind), p (processor specific) Fixes: 57e7c169cd6a ("bpf: Add __bpf_kfunc tag for marking kernel functions as kfuncs") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Jiri Olsa <jolsa@kernel.org> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Yonghong Song <yonghong.song@linux.dev> Closes: https://lore.kernel.org/r/202401211357.OCX9yllM-lkp@intel.com/ Link: https://lore.kernel.org/bpf/ZlmGoT9KiYLZd91S@krava/T/ Link: https://lore.kernel.org/bpf/e9c64e9b5c073dabd457ff45128aabcab7630098.1717477560.git.Tony.Ambardar@gmail.com
2024-06-14compiler_types.h: Define __retain for __attribute__((__retain__))Tony Ambardar
Some code includes the __used macro to prevent functions and data from being optimized out. This macro implements __attribute__((__used__)), which operates at the compiler and IR-level, and so still allows a linker to remove objects intended to be kept. Compilers supporting __attribute__((__retain__)) can address this gap by setting the flag SHF_GNU_RETAIN on the section of a function/variable, indicating to the linker the object should be retained. This attribute is available since gcc 11, clang 13, and binutils 2.36. Provide a __retain macro implementing __attribute__((__retain__)), whose first user will be the '__bpf_kfunc' tag. [ Additional remark from discussion: Why is CONFIG_LTO_CLANG added here? The __used macro permits garbage collection at section level, so CLANG_LTO_CLANG without CONFIG_LD_DEAD_CODE_DATA_ELIMINATION should not change final section dynamics? The conditional guard was included to ensure consistent behaviour between __retain and other features forcing split sections. In particular, the same guard is used in vmlinux.lds.h to merge split sections where needed. For example, using __retain in LLVM builds without CONFIG_LTO was failing CI tests on kernel-patches/bpf because the kernel didn't boot properly. And in further testing, the kernel had no issues loading BPF kfunc modules with such split sections, so the module (partial) linking scripts were left alone. ] Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/bpf/ZlmGoT9KiYLZd91S@krava/T/ Link: https://lore.kernel.org/bpf/b31bca5a5e6765a0f32cc8c19b1d9cdbfaa822b5.1717477560.git.Tony.Ambardar@gmail.com
2024-06-14Merge tag 'acpi-6.10-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fixes from Rafael Wysocki: "These fix a recent regression in the ACPI EC driver and make system suspend work on multiple platforms where StorageD3Enable _DSD is missing in the ACPI tables. Specifics: - Make the ACPI EC driver directly evaluate an "orphan" _REG method under the EC device, if present, which stopped being evaluated after the driver had started to install its EC address space handler at the root of the ACPI namespace (Rafael Wysocki) - Make more devices put NVMe storage devices into D3 at suspend to work around missing StorageD3Enable _DSD in the BIOS (Mario Limonciello)" * tag 'acpi-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: EC: Evaluate orphan _REG under EC device ACPI: x86: Force StorageD3Enable on more products
2024-06-14block: move integrity information into queue_limitsChristoph Hellwig
Move the integrity information into the queue limits so that it can be set atomically with other queue limits, and that the sysfs changes to the read_verify and write_generate flags are properly synchronized. This also allows to provide a more useful helper to stack the integrity fields, although it still is separate from the main stacking function as not all stackable devices want to inherit the integrity settings. Even with that it greatly simplifies the code in md and dm. Note that the integrity field is moved as-is into the queue limits. While there are good arguments for removing the separate blk_integrity structure, this would cause a lot of churn and might better be done at a later time if desired. However the integrity field in the queue_limits structure is now unconditional so that various ifdefs can be avoided or replaced with IS_ENABLED(). Given that tiny size of it that seems like a worthwhile trade off. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20240613084839.1044015-13-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-14block: invert the BLK_INTEGRITY_{GENERATE,VERIFY} flagsChristoph Hellwig
Invert the flags so that user set values will be able to persist revalidating the integrity information once we switch the integrity information to queue_limits. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20240613084839.1044015-12-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-14block: bypass the STABLE_WRITES flag for protection informationChristoph Hellwig
Currently registering a checksum-enabled (aka PI) integrity profile sets the QUEUE_FLAG_STABLE_WRITE flag, and unregistering it clears the flag. This can incorrectly clear the flag when the driver requires stable writes even without PI, e.g. in case of iSCSI or NVMe/TCP with data digest enabled. Fix this by looking at the csum_type directly in bdev_stable_writes and not setting the queue flag. Also remove the blk_queue_stable_writes helper as the only user in nvme wants to only look at the actual QUEUE_FLAG_STABLE_WRITE flag as it inherits the integrity configuration by other means. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20240613084839.1044015-11-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-14block: remove the blk_integrity_profile structureChristoph Hellwig
Block layer integrity configuration is a bit complex right now, as it indirects through operation vectors for a simple two-dimensional configuration: a) the checksum type of none, ip checksum, crc, crc64 b) the presence or absence of a reference tag Remove the integrity profile, and instead add a separate csum_type flag which replaces the existing ip-checksum field and a new flag that indicates the presence of the reference tag. This removes up to two layers of indirect calls, remove the need to offload the no-op verification of non-PI metadata to a workqueue and generally simplifies the code. The downside is that block/t10-pi.c now has to be built into the kernel when CONFIG_BLK_DEV_INTEGRITY is supported. Given that both nvme and SCSI require t10-pi.ko, it is loaded for all usual configurations that enabled CONFIG_BLK_DEV_INTEGRITY already, though. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20240613084839.1044015-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-14block: add special APIs for run-time disabling of discard and friendsChristoph Hellwig
A few drivers optimistically try to support discard, write zeroes and secure erase and disable the features from the I/O completion handler if the hardware can't support them. This disable can't be done using the atomic queue limits API because the I/O completion handlers can't take sleeping locks or freeze the queue. Keep the existing clearing of the relevant field to zero, but replace the old blk_queue_max_* APIs with new disable APIs that force the value to 0. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20240531074837.1648501-15-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-14block: remove unused queue limits APIChristoph Hellwig
Remove all APIs that are unused now that sd and sr have been converted to the atomic queue limits API. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20240531074837.1648501-14-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-14platform: arm64: add Lenovo Yoga C630 WOS EC driverDmitry Baryshkov
Lenovo Yoga C630 WOS is a laptop using Snapdragon 850 SoC. Like many laptops it uses an embedded controller (EC) to perform various platform operations, including, but not limited, to Type-C port control or power supply handlng. Add the driver for the EC, that creates devices for UCSI and power supply devices. Reviewed-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Link: https://lore.kernel.org/r/20240614-yoga-ec-driver-v7-2-9f0b9b40ae76@linaro.org [ij: added #include <linux/cleanup.h>] Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2024-06-14HID: bpf: make part of struct hid_device writableBenjamin Tissoires
It is useful to change the name, the phys and/or the uniq of a struct hid_device during .rdesc_fixup(). For example, hid-uclogic.ko changes the uniq to store the firmware version to differentiate between 2 devices sharing the same PID. In the same way, changing the device name is useful when the device export 3 nodes, all with the same name. Link: https://lore.kernel.org/r/20240608-hid_bpf_struct_ops-v3-16-6ac6ade58329@kernel.org Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
2024-06-14Documentation: HID: amend HID-BPF for struct_opsBenjamin Tissoires
Now that we are using struct_ops, the docs need to be changed. Link: https://lore.kernel.org/r/20240608-hid_bpf_struct_ops-v3-10-6ac6ade58329@kernel.org Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
2024-06-14HID: bpf: remove tracing HID-BPF capabilityBenjamin Tissoires
We can now rely on struct_ops as we cleared the users in-tree. Link: https://lore.kernel.org/r/20240608-hid_bpf_struct_ops-v3-8-6ac6ade58329@kernel.org Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
2024-06-14HID: bpf: implement HID-BPF through bpf_struct_opsBenjamin Tissoires
We do this implementation in several steps to not have the CI failing: - first (this patch), we add struct_ops while keeping the existing infra available - then we change the selftests, the examples and the existing in-tree HID-BPF programs - then we remove the existing trace points making old HID-BPF obsolete There are a few advantages of struct_ops over tracing: - compatibility with sleepable programs (for hid_hw_raw_request() in a later patch) - a lot simpler in the kernel: it's a simple rcu protected list - we can add more parameters to the function called without much trouble - the "attach" is now generic through BPF-core: the caller just needs to set hid_id and flags before calling __load(). - all the BPF tough part is not handled in BPF-core through generic processing - hid_bpf_ctx is now only writable where it needs be Acked-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/r/20240608-hid_bpf_struct_ops-v3-3-6ac6ade58329@kernel.org Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
2024-06-14HID: rename struct hid_bpf_ops into hid_opsBenjamin Tissoires
Those operations are the ones from HID, not HID-BPF, and I'd like to reuse hid_bpf_ops as the user facing struct_ops API. Link: https://lore.kernel.org/r/20240608-hid_bpf_struct_ops-v3-1-6ac6ade58329@kernel.org Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
2024-06-14leds: class: Add flag to avoid automatic renaming of LED devicesThomas Weißschuh
Add a mechanism for drivers to opt-out of the automatic device renaming on conflicts. Those drivers will provide their own conflict resolution. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://lore.kernel.org/r/20240526-cros_ec-kbd-led-framework-v3-2-ee577415a521@weissschuh.net Signed-off-by: Lee Jones <lee@kernel.org>
2024-06-14dt-bindings: clock: sophgo: add clkgen for SG2042Chen Wang
Add bindings for the clock generator of divider/mux and gates working for other subsystem than RP subsystem for Sophgo SG2042. Signed-off-by: Chen Wang <unicorn_wang@outlook.com> Reviewed-by: Rob Herring <robh@kernel.org>
2024-06-14dt-bindings: clock: sophgo: add RP gate clocks for SG2042Chen Wang
Add bindings for the gate clocks of RP subsystem for Sophgo SG2042. Signed-off-by: Chen Wang <unicorn_wang@outlook.com> Reviewed-by: Rob Herring <robh@kernel.org>
2024-06-14dt-bindings: clock: sophgo: add pll clocks for SG2042Chen Wang
Add bindings for the pll clocks for Sophgo SG2042. Signed-off-by: Chen Wang <unicorn_wang@outlook.com> Reviewed-by: Rob Herring <robh@kernel.org> Reviewed-by: Guo Ren <guoren@kernel.org>
2024-06-13io_uring: fix cancellation overwriting req->flagsPavel Begunkov
Only the current owner of a request is allowed to write into req->flags. Hence, the cancellation path should never touch it. Add a new field instead of the flag, move it into the 3rd cache line because it should always be initialised. poll_refs can move further as polling is an involved process anyway. It's a minimal patch, in the future we can and should find a better place for it and remove now unused REQ_F_CANCEL_SEQ. Fixes: 521223d7c229f ("io_uring/cancel: don't default to setting req->work.cancel_seq") Cc: stable@vger.kernel.org Reported-by: Li Shi <sl1589472800@gmail.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/6827b129f8f0ad76fa9d1f0a773de938b240ffab.1718323430.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-13scsi: core: Introduce the BLIST_SKIP_IO_HINTS flagBart Van Assche
Prepare for skipping the IO Advice Hints Grouping mode page for USB storage devices. Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Joao Machado <jocrismachado@gmail.com> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Christian Heusel <christian@heusel.eu> Cc: stable@vger.kernel.org Fixes: 4f53138fffc2 ("scsi: sd: Translate data lifetime information") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240613211828.2077477-2-bvanassche@acm.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-06-13Merge branch 'mana-shared' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Leon Romanovsky says: ==================== net: mana: Allow variable size indirection table Like we talked, I created new shared branch for this patch: https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/log/?h=mana-shared * 'mana-shared' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: net: mana: Allow variable size indirection table ==================== Link: https://lore.kernel.org/all/20240612183051.GE4966@unreal Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-13video/logo: Remove linux_serial_image commentsGeert Uytterhoeven
The last user of the serial_console ASCII image was removed in v2.1.115. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Helge Deller <deller@gmx.de>
2024-06-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. No conflicts, no adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-13ACPI: NUMA: Consolidate header includesThorsten Blum
The header file acpi/acpi_numa.h is included whether CONFIG_ACPI is defined or not. Include it only once before the #ifdef/#else/#endif preprocessor directives and fix the following make includecheck warning: acpi/acpi_numa.h is included more than once Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-13ima: Avoid blocking in RCU read-side critical sectionGUO Zihua
A panic happens in ima_match_policy: BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 PGD 42f873067 P4D 0 Oops: 0000 [#1] SMP NOPTI CPU: 5 PID: 1286325 Comm: kubeletmonit.sh Kdump: loaded Tainted: P Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015 RIP: 0010:ima_match_policy+0x84/0x450 Code: 49 89 fc 41 89 cf 31 ed 89 44 24 14 eb 1c 44 39 7b 18 74 26 41 83 ff 05 74 20 48 8b 1b 48 3b 1d f2 b9 f4 00 0f 84 9c 01 00 00 <44> 85 73 10 74 ea 44 8b 6b 14 41 f6 c5 01 75 d4 41 f6 c5 02 74 0f RSP: 0018:ff71570009e07a80 EFLAGS: 00010207 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000200 RDX: ffffffffad8dc7c0 RSI: 0000000024924925 RDI: ff3e27850dea2000 RBP: 0000000000000000 R08: 0000000000000000 R09: ffffffffabfce739 R10: ff3e27810cc42400 R11: 0000000000000000 R12: ff3e2781825ef970 R13: 00000000ff3e2785 R14: 000000000000000c R15: 0000000000000001 FS: 00007f5195b51740(0000) GS:ff3e278b12d40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000010 CR3: 0000000626d24002 CR4: 0000000000361ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ima_get_action+0x22/0x30 process_measurement+0xb0/0x830 ? page_add_file_rmap+0x15/0x170 ? alloc_set_pte+0x269/0x4c0 ? prep_new_page+0x81/0x140 ? simple_xattr_get+0x75/0xa0 ? selinux_file_open+0x9d/0xf0 ima_file_check+0x64/0x90 path_openat+0x571/0x1720 do_filp_open+0x9b/0x110 ? page_counter_try_charge+0x57/0xc0 ? files_cgroup_alloc_fd+0x38/0x60 ? __alloc_fd+0xd4/0x250 ? do_sys_open+0x1bd/0x250 do_sys_open+0x1bd/0x250 do_syscall_64+0x5d/0x1d0 entry_SYSCALL_64_after_hwframe+0x65/0xca Commit c7423dbdbc9e ("ima: Handle -ESTALE returned by ima_filter_rule_match()") introduced call to ima_lsm_copy_rule within a RCU read-side critical section which contains kmalloc with GFP_KERNEL. This implies a possible sleep and violates limitations of RCU read-side critical sections on non-PREEMPT systems. Sleeping within RCU read-side critical section might cause synchronize_rcu() returning early and break RCU protection, allowing a UAF to happen. The root cause of this issue could be described as follows: | Thread A | Thread B | | |ima_match_policy | | | rcu_read_lock | |ima_lsm_update_rule | | | synchronize_rcu | | | | kmalloc(GFP_KERNEL)| | | sleep | ==> synchronize_rcu returns early | kfree(entry) | | | | entry = entry->next| ==> UAF happens and entry now becomes NULL (or could be anything). | | entry->action | ==> Accessing entry might cause panic. To fix this issue, we are converting all kmalloc that is called within RCU read-side critical section to use GFP_ATOMIC. Fixes: c7423dbdbc9e ("ima: Handle -ESTALE returned by ima_filter_rule_match()") Cc: stable@vger.kernel.org Signed-off-by: GUO Zihua <guozihua@huawei.com> Acked-by: John Johansen <john.johansen@canonical.com> Reviewed-by: Mimi Zohar <zohar@linux.ibm.com> Reviewed-by: Casey Schaufler <casey@schaufler-ca.com> [PM: fixed missing comment, long lines, !CONFIG_IMA_LSM_RULES case] Signed-off-by: Paul Moore <paul@paul-moore.com>