summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-07-12udmabuf: convert udmabuf driver to use foliosVivek Kasireddy
This is mainly a preparatory patch to use memfd_pin_folios() API for pinning folios. Using folios instead of pages makes sense as the udmabuf driver needs to handle both shmem and hugetlb cases. And, using the memfd_pin_folios() API makes this easier as we no longer need to separately handle shmem vs hugetlb cases in the udmabuf driver. Note that, the function vmap_udmabuf() still needs a list of pages; so, we collect all the head pages into a local array in this case. Other changes in this patch include the addition of helpers for checking the memfd seals and exporting dmabuf. Moving code from udmabuf_create() into these helpers improves readability given that udmabuf_create() is a bit long. Link: https://lkml.kernel.org/r/20240624063952.1572359-8-vivek.kasireddy@intel.com Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com> Acked-by: Dave Airlie <airlied@redhat.com> Acked-by: Gerd Hoffmann <kraxel@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Hugh Dickins <hughd@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Dongwon Kim <dongwon.kim@intel.com> Cc: Junxiao Chang <junxiao.chang@intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christoph Hellwig <hch@infradead.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-12udmabuf: add back support for mapping hugetlb pagesVivek Kasireddy
A user or admin can configure a VMM (Qemu) Guest's memory to be backed by hugetlb pages for various reasons. However, a Guest OS would still allocate (and pin) buffers that are backed by regular 4k sized pages. In order to map these buffers and create dma-bufs for them on the Host, we first need to find the hugetlb pages where the buffer allocations are located and then determine the offsets of individual chunks (within those pages) and use this information to eventually populate a scatterlist. Testcase: default_hugepagesz=2M hugepagesz=2M hugepages=2500 options were passed to the Host kernel and Qemu was launched with these relevant options: qemu-system-x86_64 -m 4096m.... -device virtio-gpu-pci,max_outputs=1,blob=true,xres=1920,yres=1080 -display gtk,gl=on -object memory-backend-memfd,hugetlb=on,id=mem1,size=4096M -machine memory-backend=mem1 Replacing -display gtk,gl=on with -display gtk,gl=off above would exercise the mmap handler. Link: https://lkml.kernel.org/r/20240624063952.1572359-7-vivek.kasireddy@intel.com Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com> Acked-by: Mike Kravetz <mike.kravetz@oracle.com> (v2) Acked-by: Dave Airlie <airlied@redhat.com> Acked-by: Gerd Hoffmann <kraxel@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Hugh Dickins <hughd@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Dongwon Kim <dongwon.kim@intel.com> Cc: Junxiao Chang <junxiao.chang@intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christoph Hellwig <hch@infradead.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-12udmabuf: use vmf_insert_pfn and VM_PFNMAP for handling mmapVivek Kasireddy
Add VM_PFNMAP to vm_flags in the mmap handler to ensure that the mappings would be managed without using struct page. And, in the vm_fault handler, use vmf_insert_pfn to share the page's pfn to userspace instead of directly sharing the page (via struct page *). Link: https://lkml.kernel.org/r/20240624063952.1572359-6-vivek.kasireddy@intel.com Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com> Suggested-by: David Hildenbrand <david@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Dave Airlie <airlied@redhat.com> Acked-by: Gerd Hoffmann <kraxel@redhat.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Hugh Dickins <hughd@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Dongwon Kim <dongwon.kim@intel.com> Cc: Junxiao Chang <junxiao.chang@intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christoph Hellwig <hch@infradead.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-12udmabuf: add CONFIG_MMU dependencyArnd Bergmann
There is no !CONFIG_MMU version of vmf_insert_pfn(): arm-linux-gnueabi-ld: drivers/dma-buf/udmabuf.o: in function `udmabuf_vm_fault': udmabuf.c:(.text+0xaa): undefined reference to `vmf_insert_pfn' Link: https://lkml.kernel.org/r/20240624063952.1572359-5-vivek.kasireddy@intel.com Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Vivek Kasireddy <vivek.kasireddy@intel.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Dave Airlie <airlied@redhat.com> Cc: Dongwon Kim <dongwon.kim@intel.com> Cc: Gerd Hoffmann <kraxel@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Junxiao Chang <junxiao.chang@intel.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-12mm/gup: introduce memfd_pin_folios() for pinning memfd foliosVivek Kasireddy
For drivers that would like to longterm-pin the folios associated with a memfd, the memfd_pin_folios() API provides an option to not only pin the folios via FOLL_PIN but also to check and migrate them if they reside in movable zone or CMA block. This API currently works with memfds but it should work with any files that belong to either shmemfs or hugetlbfs. Files belonging to other filesystems are rejected for now. The folios need to be located first before pinning them via FOLL_PIN. If they are found in the page cache, they can be immediately pinned. Otherwise, they need to be allocated using the filesystem specific APIs and then pinned. [akpm@linux-foundation.org: improve the CONFIG_MMU=n situation, per SeongJae] [vivek.kasireddy@intel.com: return -EINVAL if the end offset is greater than the size of memfd] Link: https://lkml.kernel.org/r/IA0PR11MB71850525CBC7D541CAB45DF1F8DB2@IA0PR11MB7185.namprd11.prod.outlook.com Link: https://lkml.kernel.org/r/20240624063952.1572359-4-vivek.kasireddy@intel.com Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com> Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> (v2) Reviewed-by: David Hildenbrand <david@redhat.com> (v3) Reviewed-by: Christoph Hellwig <hch@lst.de> (v6) Acked-by: Dave Airlie <airlied@redhat.com> Acked-by: Gerd Hoffmann <kraxel@redhat.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Hugh Dickins <hughd@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: Dongwon Kim <dongwon.kim@intel.com> Cc: Junxiao Chang <junxiao.chang@intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christoph Hellwig <hch@infradead.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-12mm/gup: introduce check_and_migrate_movable_folios()Vivek Kasireddy
This helper is the folio equivalent of check_and_migrate_movable_pages(). Therefore, all the rules that apply to check_and_migrate_movable_pages() also apply to this one as well. Currently, this helper is only used by memfd_pin_folios(). This patch also includes changes to rename and convert the internal functions collect_longterm_unpinnable_pages() and migrate_longterm_unpinnable_pages() to work on folios. As a result, check_and_migrate_movable_pages() is now a wrapper around check_and_migrate_movable_folios(). Link: https://lkml.kernel.org/r/20240624063952.1572359-3-vivek.kasireddy@intel.com Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com> Suggested-by: David Hildenbrand <david@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Dave Airlie <airlied@redhat.com> Acked-by: Gerd Hoffmann <kraxel@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Peter Xu <peterx@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Dongwon Kim <dongwon.kim@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Junxiao Chang <junxiao.chang@intel.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-12mm/gup: introduce unpin_folio/unpin_folios helpersVivek Kasireddy
Patch series "mm/gup: Introduce memfd_pin_folios() for pinning memfd folios", v16. Currently, some drivers (e.g, Udmabuf) that want to longterm-pin the pages/folios associated with a memfd, do so by simply taking a reference on them. This is not desirable because the pages/folios may reside in Movable zone or CMA block. Therefore, having drivers use memfd_pin_folios() API ensures that the folios are appropriately pinned via FOLL_PIN for longterm DMA. This patchset also introduces a few helpers and converts the Udmabuf driver to use folios and memfd_pin_folios() API to longterm-pin the folios for DMA. Two new Udmabuf selftests are also included to test the driver and the new API. This patch (of 9): These helpers are the folio versions of unpin_user_page/unpin_user_pages. They are currently only useful for unpinning folios pinned by memfd_pin_folios() or other associated routines. However, they could find new uses in the future, when more and more folio-only helpers are added to GUP. We should probably sanity check the folio as part of unpin similar to how it is done in unpin_user_page/unpin_user_pages but we cannot cleanly do that at the moment without also checking the subpage. Therefore, sanity checking needs to be added to these routines once we have a way to determine if any given folio is anon-exclusive (via a per folio AnonExclusive flag). Link: https://lkml.kernel.org/r/20240624063952.1572359-1-vivek.kasireddy@intel.com Link: https://lkml.kernel.org/r/20240624063952.1572359-2-vivek.kasireddy@intel.com Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com> Suggested-by: David Hildenbrand <david@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Dave Airlie <airlied@redhat.com> Acked-by: Gerd Hoffmann <kraxel@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Peter Xu <peterx@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Dongwon Kim <dongwon.kim@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Junxiao Chang <junxiao.chang@intel.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-12mm/zswap: use only one pool in zswapChengming Zhou
Zswap uses 32 pools to workaround the locking scalability problem in zswap backends (mainly zsmalloc nowadays), which brings its own problems like memory waste and more memory fragmentation. Testing results show that we can have near performance with only one pool in zswap after changing zsmalloc to use per-size_class lock instead of pool spinlock. Testing kernel build (make bzImage -j32) on tmpfs with memory.max=1GB, and zswap shrinker enabled with 10GB swapfile on ext4. real user sys 6.10.0-rc3 138.18 1241.38 1452.73 6.10.0-rc3-onepool 149.45 1240.45 1844.69 6.10.0-rc3-onepool-perclass 138.23 1242.37 1469.71 And do the same testing using zbud, which shows a little worse performance as expected since we don't do any locking optimization for zbud. I think it's acceptable since zsmalloc became a lot more popular than other backends, and we may want to support only zsmalloc in the future. real user sys 6.10.0-rc3-zbud 138.23 1239.58 1430.09 6.10.0-rc3-onepool-zbud 139.64 1241.37 1516.59 [chengming.zhou@linux.dev: fix error handling in zswap_pool_create(), per Dan Carpenter] Link: https://lkml.kernel.org/r/20240621-zsmalloc-lock-mm-everything-v2-2-d30e9cd2b793@linux.dev [chengming.zhou@linux.dev: fix error handling again in zswap_pool_create(), per Yosry] Link: https://lkml.kernel.org/r/20240625-zsmalloc-lock-mm-everything-v3-2-ad941699cb61@linux.dev Link: https://lkml.kernel.org/r/20240617-zsmalloc-lock-mm-everything-v1-2-5e5081ea11b3@linux.dev Signed-off-by: Chengming Zhou <chengming.zhou@linux.dev> Reviewed-by: Nhat Pham <nphamcs@gmail.com> Acked-by: Yosry Ahmed <yosryahmed@google.com> Cc: Chengming Zhou <zhouchengming@bytedance.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-12mm/zsmalloc: change back to per-size_class lockChengming Zhou
Patch series "mm/zsmalloc: change back to per-size_class lock, v2". Commit c0547d0b6a4b ("zsmalloc: consolidate zs_pool's migrate_lock and size_class's locks") changed per-size_class lock to pool spinlock to prepare reclaim support in zsmalloc. Then reclaim support in zsmalloc had been dropped in favor of LRU reclaim in zswap, but this locking change had been left there. Obviously, the scalability of pool spinlock is worse than per-size_class. And we have a workaround that using 32 pools in zswap to avoid this scalability problem, which brings its own problems like memory waste and more memory fragmentation. So this series changes back to use per-size_class lock and using testing data in much stressed situation to verify that we can use only one pool in zswap. Note we only test and care about the zsmalloc backend, which makes sense now since zsmalloc became a lot more popular than other backends. Testing kernel build (make bzImage -j32) on tmpfs with memory.max=1GB, and zswap shrinker enabled with 10GB swapfile on ext4. real user sys 6.10.0-rc3 138.18 1241.38 1452.73 6.10.0-rc3-onepool 149.45 1240.45 1844.69 6.10.0-rc3-onepool-perclass 138.23 1242.37 1469.71 We can see from "sys" column that per-size_class locking with only one pool in zswap can have near performance with the current 32 pools. This patch (of 2): This patch is almost the revert of the commit c0547d0b6a4b ("zsmalloc: consolidate zs_pool's migrate_lock and size_class's locks"), which changed to use a global pool->lock instead of per-size_class lock and pool->migrate_lock, was preparation for suppporting reclaim in zsmalloc. Then reclaim in zsmalloc had been dropped in favor of LRU reclaim in zswap. In theory, per-size_class is more fine-grained than the pool->lock, since a pool can have many size_classes. As for the additional pool->migrate_lock, only free() and map() need to grab it to access stable handle to get zspage, and only in read lock mode. Link: https://lkml.kernel.org/r/20240625-zsmalloc-lock-mm-everything-v3-0-ad941699cb61@linux.dev Link: https://lkml.kernel.org/r/20240621-zsmalloc-lock-mm-everything-v2-0-d30e9cd2b793@linux.dev Link: https://lkml.kernel.org/r/20240617-zsmalloc-lock-mm-everything-v1-0-5e5081ea11b3@linux.dev Link: https://lkml.kernel.org/r/20240617-zsmalloc-lock-mm-everything-v1-1-5e5081ea11b3@linux.dev Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-12mm/hugetlb.c: undo errant changeAndrew Morton
During conflict resolution a line was unintentionally removed by a ksm.c patch. Link: https://lkml.kernel.org/r/85b0d694-d1ac-8e7a-2e50-1edc03eee21a@google.com Fixes: ac90c56bbd73 ("mm/ksm: refactor out try_to_merge_with_zero_page()") Reported-by: Hugh Dickins <hughd@google.com> Cc: Aristeu Rozanski <aris@redhat.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-12PCI: Extend ACS configurabilityVidya Sagar
PCIe ACS settings control the level of isolation and the possible P2P paths between devices. With greater isolation the kernel will create smaller iommu_groups and with less isolation there is more HW that can achieve P2P transfers. From a virtualization perspective all devices in the same iommu_group must be assigned to the same VM as they lack security isolation. There is no way for the kernel to automatically know the correct ACS settings for any given system and workload. Existing command line options (e.g., disable_acs_redir) allow only for large scale change, disabling all isolation, but this is not sufficient for more complex cases. Add a kernel command-line option 'config_acs' to directly control all the ACS bits for specific devices, which allows the operator to setup the right level of isolation to achieve the desired P2P configuration. The definition is future proof; when new ACS bits are added to the spec the open syntax can be extended. ACS needs to be setup early in the kernel boot as the ACS settings affect how iommu_groups are formed. iommu_group formation is a one time event during initial device discovery, so changing ACS bits after kernel boot can result in an inaccurate view of the iommu_groups compared to the current isolation configuration. ACS applies to PCIe Downstream Ports and multi-function devices. The default ACS settings are strict and deny any direct traffic between two functions. This results in the smallest iommu_group the HW can support. Frequently these values result in slow or non-working P2PDMA. ACS offers a range of security choices controlling how traffic is allowed to go directly between two devices. Some popular choices: - Full prevention - Translated requests can be direct, with various options - Asymmetric direct traffic, A can reach B but not the reverse - All traffic can be direct Along with some other less common ones for special topologies. The intention is that this option would be used with expert knowledge of the HW capability and workload to achieve the desired configuration. Link: https://lore.kernel.org/r/20240625153150.159310-1-vidyas@nvidia.com Signed-off-by: Vidya Sagar <vidyas@nvidia.com> [bhelgaas: add example, tidy printk formats] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-07-12PCI: Add missing bridge lock to pci_bus_lock()Dan Williams
One of the true positives that the cfg_access_lock lockdep effort identified is this sequence: WARNING: CPU: 14 PID: 1 at drivers/pci/pci.c:4886 pci_bridge_secondary_bus_reset+0x5d/0x70 RIP: 0010:pci_bridge_secondary_bus_reset+0x5d/0x70 Call Trace: <TASK> ? __warn+0x8c/0x190 ? pci_bridge_secondary_bus_reset+0x5d/0x70 ? report_bug+0x1f8/0x200 ? handle_bug+0x3c/0x70 ? exc_invalid_op+0x18/0x70 ? asm_exc_invalid_op+0x1a/0x20 ? pci_bridge_secondary_bus_reset+0x5d/0x70 pci_reset_bus+0x1d8/0x270 vmd_probe+0x778/0xa10 pci_device_probe+0x95/0x120 Where pci_reset_bus() users are triggering unlocked secondary bus resets. Ironically pci_bus_reset(), several calls down from pci_reset_bus(), uses pci_bus_lock() before issuing the reset which locks everything *but* the bridge itself. For the same motivation as adding: bridge = pci_upstream_bridge(dev); if (bridge) pci_dev_lock(bridge); to pci_reset_function() for the "bus" and "cxl_bus" reset cases, add pci_dev_lock() for @bus->self to pci_bus_lock(). Link: https://lore.kernel.org/r/171711747501.1628941.15217746952476635316.stgit@dwillia2-xfh.jf.intel.com Reported-by: Imre Deak <imre.deak@intel.com> Closes: http://lore.kernel.org/r/6657833b3b5ae_14984b29437@dwillia2-xfh.jf.intel.com.notmuch Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Keith Busch <kbusch@kernel.org> [bhelgaas: squash in recursive locking deadlock fix from Keith Busch: https://lore.kernel.org/r/20240711193650.701834-1-kbusch@meta.com] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Hans de Goede <hdegoede@redhat.com> Tested-by: Kalle Valo <kvalo@kernel.org> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
2024-07-12dt-bindings: ata: ahci-fsl-qoriq: add fsl,ls1046a-ahci and fsl,ls1012a-ahciFrank Li
Add missing documented compatible strings 'fsl,ls1046a-ahci' and 'fsl,ls1012a-ahci'. Allow 'fsl,ls1012a-ahci' to fallback to 'fsl,ls1043a-ahci'. Fix below CHECK_DTB warnings arch/arm64/boot/dts/freescale/fsl-ls1012a-qds.dtb: sata@3200000: compatible:0: 'fsl,ls1012a-ahci' is not one of ['fsl,ls1021a-ahci', 'fsl,ls1043a-ahci', 'fsl,ls1028a-ahci', 'fsl,ls1088a-ahci', 'fsl,ls2080a-ahci', 'fsl,lx2160a-ahci'] arch/arm64/boot/dts/freescale/fsl-ls1012a-qds.dtb: sata@3200000: compatible: ['fsl,ls1012a-ahci', 'fsl,ls1043a-ahci'] is too long Fixes: e58e12c5c34c ("dt-bindings: ata: ahci-fsl-qoriq: convert to yaml format") Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Frank Li <Frank.Li@nxp.com> Link: https://lore.kernel.org/r/20240712185740.3819170-1-Frank.Li@nxp.com [cassel: rewrap commit log lines, capitalize SATA] Signed-off-by: Niklas Cassel <cassel@kernel.org>
2024-07-12docs: driver-model: platform: update the definition of platform_driverEric Biggers
Update the documented struct platform_driver to match the code. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20240711200421.11428-1-ebiggers@kernel.org
2024-07-12selftests/bpf: Test for null-pointer-deref bugfix in resolve_prog_type()Tengda Wu
This test verifies that resolve_prog_type() works as expected when `attach_prog_fd` is not passed in. `prog->aux->dst_prog` in resolve_prog_type() is assigned by `attach_prog_fd`, and would be NULL if `attach_prog_fd` is not provided. Loading EXT prog with bpf_dynptr_from_skb() kfunc call in this way will lead to null-pointer-deref. Verify that the null-pointer-deref bug in resolve_prog_type() is fixed. Signed-off-by: Tengda Wu <wutengda@huaweicloud.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240711145819.254178-3-wutengda@huaweicloud.com
2024-07-12bpf: Fix null pointer dereference in resolve_prog_type() for BPF_PROG_TYPE_EXTTengda Wu
When loading a EXT program without specifying `attr->attach_prog_fd`, the `prog->aux->dst_prog` will be null. At this time, calling resolve_prog_type() anywhere will result in a null pointer dereference. Example stack trace: [ 8.107863] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000004 [ 8.108262] Mem abort info: [ 8.108384] ESR = 0x0000000096000004 [ 8.108547] EC = 0x25: DABT (current EL), IL = 32 bits [ 8.108722] SET = 0, FnV = 0 [ 8.108827] EA = 0, S1PTW = 0 [ 8.108939] FSC = 0x04: level 0 translation fault [ 8.109102] Data abort info: [ 8.109203] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 [ 8.109399] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ 8.109614] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 8.109836] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000101354000 [ 8.110011] [0000000000000004] pgd=0000000000000000, p4d=0000000000000000 [ 8.112624] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP [ 8.112783] Modules linked in: [ 8.113120] CPU: 0 PID: 99 Comm: may_access_dire Not tainted 6.10.0-rc3-next-20240613-dirty #1 [ 8.113230] Hardware name: linux,dummy-virt (DT) [ 8.113390] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 8.113429] pc : may_access_direct_pkt_data+0x24/0xa0 [ 8.113746] lr : add_subprog_and_kfunc+0x634/0x8e8 [ 8.113798] sp : ffff80008283b9f0 [ 8.113813] x29: ffff80008283b9f0 x28: ffff800082795048 x27: 0000000000000001 [ 8.113881] x26: ffff0000c0bb2600 x25: 0000000000000000 x24: 0000000000000000 [ 8.113897] x23: ffff0000c1134000 x22: 000000000001864f x21: ffff0000c1138000 [ 8.113912] x20: 0000000000000001 x19: ffff0000c12b8000 x18: ffffffffffffffff [ 8.113929] x17: 0000000000000000 x16: 0000000000000000 x15: 0720072007200720 [ 8.113944] x14: 0720072007200720 x13: 0720072007200720 x12: 0720072007200720 [ 8.113958] x11: 0720072007200720 x10: 0000000000f9fca4 x9 : ffff80008021f4e4 [ 8.113991] x8 : 0101010101010101 x7 : 746f72705f6d656d x6 : 000000001e0e0f5f [ 8.114006] x5 : 000000000001864f x4 : ffff0000c12b8000 x3 : 000000000000001c [ 8.114020] x2 : 0000000000000002 x1 : 0000000000000000 x0 : 0000000000000000 [ 8.114126] Call trace: [ 8.114159] may_access_direct_pkt_data+0x24/0xa0 [ 8.114202] bpf_check+0x3bc/0x28c0 [ 8.114214] bpf_prog_load+0x658/0xa58 [ 8.114227] __sys_bpf+0xc50/0x2250 [ 8.114240] __arm64_sys_bpf+0x28/0x40 [ 8.114254] invoke_syscall.constprop.0+0x54/0xf0 [ 8.114273] do_el0_svc+0x4c/0xd8 [ 8.114289] el0_svc+0x3c/0x140 [ 8.114305] el0t_64_sync_handler+0x134/0x150 [ 8.114331] el0t_64_sync+0x168/0x170 [ 8.114477] Code: 7100707f 54000081 f9401c00 f9403800 (b9400403) [ 8.118672] ---[ end trace 0000000000000000 ]--- One way to fix it is by forcing `attach_prog_fd` non-empty when bpf_prog_load(). But this will lead to `libbpf_probe_bpf_prog_type` API broken which use verifier log to probe prog type and will log nothing if we reject invalid EXT prog before bpf_check(). Another way is by adding null check in resolve_prog_type(). The issue was introduced by commit 4a9c7bbe2ed4 ("bpf: Resolve to prog->aux->dst_prog->type only for BPF_PROG_TYPE_EXT") which wanted to correct type resolution for BPF_PROG_TYPE_TRACING programs. Before that, the type resolution of BPF_PROG_TYPE_EXT prog actually follows the logic below: prog->aux->dst_prog ? prog->aux->dst_prog->type : prog->type; It implies that when EXT program is not yet attached to `dst_prog`, the prog type should be EXT itself. This code worked fine in the past. So just keep using it. Fix this by returning `prog->type` for BPF_PROG_TYPE_EXT if `dst_prog` is not present in resolve_prog_type(). Fixes: 4a9c7bbe2ed4 ("bpf: Resolve to prog->aux->dst_prog->type only for BPF_PROG_TYPE_EXT") Signed-off-by: Tengda Wu <wutengda@huaweicloud.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20240711145819.254178-2-wutengda@huaweicloud.com
2024-07-12ubifs: add check for crypto_shash_tfm_digestChen Ni
Add check for the return value of crypto_shash_tfm_digest() and return the error if it fails in order to catch the error. Fixes: 817aa094842d ("ubifs: support offline signed images") Signed-off-by: Chen Ni <nichen@iscas.ac.cn> Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2024-07-12ubifs: Fix inconsistent inode size when powercut happens during appendant ↵Zhihao Cheng
writing UBIFS always make sure that the data length won't beyond the inode size by writing inode before writing page(See ubifs_writepage.). After commit c35acef383f4a2f2cfc30("ubifs: Convert ubifs_writepage to use a folio"), the rule is broken in one case: Given a file with size 3, then write 4096 from the offset 0, following process will make inode size be smaller than file data length after powercut & recovery: P1 P2 ubifs_writepage len = folio_size(folio) // 4096 if (folio_pos(folio) + len <= i_size) // condition 1: 0 + 4096 <= 4096 //(i_size is updated as 4096 in ubifs_write_end) if (folio_pos(folio) >= synced_i_size) // condition 2: 0 >= 3, false write_inode // Skipped, because condition 2 is false do_writepage(folio, len) // write one page do_commit // data node won't be replayed in next mounting >> Powercut << So, inode size(4096) is not updated into disk, we will get following error messages in next mounting(chk_fs = 1): check_leaf [ubifs]: data node at LEB 14:2048 is not within inode size 3 dbg_walk_index [ubifs]: leaf checking function returned error -22, for leaf at LEB 14:2048 Fix it by modifying condition 2 as original comparison(Compare the page index of synced_i_size with current page index). Fixes: c35acef383f4 ("ubifs: Convert ubifs_writepage to use a folio") Link: https://bugzilla.kernel.org/show_bug.cgi?id=218934 Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2024-07-12ubi: block: fix null-pointer-dereference in ubiblock_create()Li Nan
Similar to commit adbf4c4954e3 ("ubi: block: fix memleak in ubiblock_create()"), 'dev->gd' is not assigned but dereferenced if blk_mq_alloc_tag_set() fails, and leading to a null-pointer-dereference. Fix it by using pr_err() and variable 'dev' to print error log. Additionally, the log in the error handle path of idr_alloc() has been improved by using pr_err(), too. Before initializing device name, using dev_err() will print error log with 'null' instead of the actual device name, like this: block (null): ... ~~~~~~ It is unclear. Using pr_err() can print more details of the device. The improved log is: ubiblock0_0: ... Fixes: 77567b25ab9f ("ubi: use blk_mq_alloc_disk and blk_cleanup_disk") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Li Nan <linan122@huawei.com> Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com> Reviewed-by: Daniel Golle <daniel@makrotopia.org> Signed-off-by: Richard Weinberger <richard@nod.at>
2024-07-12ubifs: fix kernel-doc warningsJeff Johnson
make C=1 reports the following kernel-doc warnings: fs/ubifs/compress.c:103: warning: Function parameter or struct member 'c' not described in 'ubifs_compress' fs/ubifs/compress.c:155: warning: Function parameter or struct member 'c' not described in 'ubifs_decompress' fs/ubifs/find.c:353: warning: Excess function parameter 'data' description in 'scan_for_free_cb' fs/ubifs/find.c:353: warning: Function parameter or struct member 'arg' not described in 'scan_for_free_cb' fs/ubifs/find.c:594: warning: Excess function parameter 'data' description in 'scan_for_idx_cb' fs/ubifs/find.c:594: warning: Function parameter or struct member 'arg' not described in 'scan_for_idx_cb' fs/ubifs/find.c:786: warning: Excess function parameter 'data' description in 'scan_dirty_idx_cb' fs/ubifs/find.c:786: warning: Function parameter or struct member 'arg' not described in 'scan_dirty_idx_cb' fs/ubifs/find.c:86: warning: Excess function parameter 'data' description in 'scan_for_dirty_cb' fs/ubifs/find.c:86: warning: Function parameter or struct member 'arg' not described in 'scan_for_dirty_cb' fs/ubifs/journal.c:369: warning: expecting prototype for wake_up_reservation(). Prototype was for add_or_start_queue() instead fs/ubifs/lprops.c:1018: warning: Excess function parameter 'lst' description in 'scan_check_cb' fs/ubifs/lprops.c:1018: warning: Function parameter or struct member 'arg' not described in 'scan_check_cb' fs/ubifs/lpt.c:1938: warning: Function parameter or struct member 'ptr' not described in 'lpt_scan_node' fs/ubifs/replay.c:60: warning: Function parameter or struct member 'hash' not described in 'replay_entry' Fix them. Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2024-07-12ubifs: correct UBIFS_DFS_DIR_LEN macro definition and improve code clarityZhaoLong Wang
The UBIFS_DFS_DIR_LEN macro, which defines the maximum length of the UBIFS debugfs directory name, has an incorrect formula and misleading comments. The current formula is (3 + 1 + 2*2 + 1), which assumes that both UBI device number and volume ID are limited to 2 characters. However, UBI device number ranges from 0 to 31 (2 characters), and volume ID ranges from 0 to 127 (up to 3 characters). Although the current code works due to the cancellation of mathematical errors (9 + 1 = 10, which matches the correct UBIFS_DFS_DIR_LEN value), it can lead to confusion and potential issues in the future. This patch aims to improve the code clarity and maintainability by making the following changes: 1. Corrects the UBIFS_DFS_DIR_LEN macro definition to (3 + 1 + 2 + 3 + 1), accommodating the maximum lengths of both UBI device number and volume ID, plus the separators and null terminator. 2. Updates the snprintf calls to use UBIFS_DFS_DIR_LEN instead of UBIFS_DFS_DIR_LEN + 1, removing the unnecessary +1. 3. Modifies the error checks to compare against UBIFS_DFS_DIR_LEN using >= instead of >, aligning with the corrected macro definition. 4. Removes the redundant +1 in the dfs_dir_name array definitions in ubi.h and debug.h. While these changes do not affect the runtime behavior, they make the code more readable, maintainable, and less prone to future errors. v2->v3: - Removes the duplicated UBIFS_DFS_DIR_LEN and UBIFS_DFS_DIR_NAME macro definitions in ubifs.h, as they are already defined in debug.h. Signed-off-by: ZhaoLong Wang <wangzhaolong1@huawei.com> Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2024-07-12mtd: ubi: Restore missing cleanup on ubi_init() failure pathBen Hutchings
We need to clean-up debugfs and ubiblock if we fail after initialising them. Signed-off-by: Ben Hutchings <ben.hutchings@mind.be> Fixes: 927c145208b0 ("mtd: ubi: attach from device tree") Signed-off-by: Richard Weinberger <richard@nod.at>
2024-07-12ubifs: dbg_orphan_check: Fix missed key type checkingZhihao Cheng
When selinux/encryption is enabled, xattr entry node is added into TNC before host inode when creating new file. So it is possible to find xattr entry without host inode from TNC. Orphan debug checking is called by ubifs_orphan_end_commit(), at that time, the commit semaphore is already unlock, so the new creation won't be blocked. Fixes: d7f0b70d30ff ("UBIFS: Add security.* XATTR support for the UBIFS") Fixes: d475a507457b ("ubifs: Add skeleton for fscrypto") Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2024-07-12ubifs: Fix unattached inode when powercut happens in creatingZhihao Cheng
For selinux or encryption scenarios, UBIFS could become inconsistent while creating new files in powercut case. Encryption/selinux related xattrs will be created before creating file dentry, which makes creation process is not atomic, details are shown as: Encryption case: ubifs_create ubifs_new_inode fscrypt_set_context ubifs_xattr_set create_xattr ubifs_jnl_update // Disk: xentry xinode inode(LAST_OF_NODE_GROUP) >> power cut << ubifs_jnl_update // Disk: dentry inode parent_inode(LAST_OF_NODE_GROUP) Selinux case: ubifs_create ubifs_new_inode ubifs_init_security security_inode_init_security ubifs_xattr_set create_xattr ubifs_jnl_update // Disk: xentry xinode inode(LAST_OF_NODE_GROUP) >> power cut << ubifs_jnl_update // Disk: dentry inode parent_inode(LAST_OF_NODE_GROUP) Above process will make chk_fs failed in next mounting: UBIFS error (ubi0:0 pid 7995): dbg_check_filesystem [ubifs]: inode 66 nlink is 1, but calculated nlink is 0 Fix it by allocating orphan inode for each non-xattr file creation, then removing orphan list in journal writing process, which ensures that both xattr and dentry be effective in atomic when powercut happens. Fixes: d7f0b70d30ff ("UBIFS: Add security.* XATTR support for the UBIFS") Fixes: d475a507457b ("ubifs: Add skeleton for fscrypto") Link: https://bugzilla.kernel.org/show_bug.cgi?id=218309 Suggested-by: Zhang Yi <yi.zhang@huawei.com> Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2024-07-12ubifs: Fix space leak when powercut happens in linking tmpfileZhihao Cheng
There is a potential space leak problem when powercut happens in linking tmpfile, in which case, inode node (with nlink=0) and its' data nodes can be found from tnc (on flash), but there are no dentries related to the inode, so the file is invisible but takes free space. Detailed process is shown as: ubifs_tmpfile ubifs_jnl_update // Add bud A into log area ubifs_add_orphan // Add inode into orphan list P1 P2 ubifs_link ubifs_delete_orphan // Delete inode from orphan list, then inode won't // be written into orphan area, there is no chance // to delete inode by replaying orphan. commit // bud A won't be replayed in next mounting >> powercut << ubifs_jnl_update // Link inode to dentry The root cause is that orphan entry deletion and journal writing(for link) are interrupted by commit, which makes the two operations are not atomic. Fix it by doing ubifs_delete_orphan under the protection of c->commit_sem within ubifs_jnl_update. This is also a preparation to support all creating new files by orphan inode. v1 is https://lore.kernel.org/linux-mtd/20200701093227.674945-1-chengzhihao1@huawei.com/ Fixes: 32fe905c17f0 ("ubifs: Fix O_TMPFILE corner case in ubifs_link()") Link: https://bugzilla.kernel.org/show_bug.cgi?id=208405 Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2024-07-12ubifs: Move ui->data initialization after initializing securityZhihao Cheng
Host inode and its' xattr will be written on disk after initializing security when creating symlink or dev, then the host inode and its dentry will be written again in ubifs_jnl_update. There is no need to write inode data in the security initialization pass, just move the ui->data initialization after initializing security. Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2024-07-12ubifs: Fix adding orphan entry twice for the same inodeZhihao Cheng
The tmpfile could be added into orphan list twice, first time is creation, the second time is removing after it is linked. The orphan entry could be added twice for tmpfile if following sequence is satisfied: ubifs_tmpfile ubifs_jnl_update ubifs_add_orphan // first time to add orphan entry P1 P2 ubifs_link do_commit ubifs_orphan_start_commit orphan->cmt = 1 ubifs_delete_orphan orphan_delete if (orph->cmt) orph->del = 1; // orphan entry is not deleted from tree return ubifs_unlink ubifs_jnl_update ubifs_add_orphan orphan_add // found old orphan entry, second time to add orphan entry ubifs_err(c, "orphaned twice") return -EINVAL // unlink failed! ubifs_orphan_end_commit erase_deleted // delete old orphan entry rb_erase(&orphan->rb, &c->orph_tree) Fix it by removing orphan entry from orphan tree in advance, rather than remove it from orphan tree in committing process. Fixes: 32fe905c17f0 ("ubifs: Fix O_TMPFILE corner case in ubifs_link()") Link: https://bugzilla.kernel.org/show_bug.cgi?id=218672 Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2024-07-12ubifs: Remove insert_dead_orphan from replaying orphan processZhihao Cheng
UBIFS will do commit at the end of mounting process(rw mode), dead orphans(added by insert_dead_orphan in replaying orphan) are deleted by ubifs_orphan_end_commit(). The only reason why dead orphans are added into orphan list is that old orpans may be lost when powercut happens in ubifs_orphan_end_commit(): ubifs_orphan_end_commit // TNC(updated by orphans) is not written yet if (c->cmt_orphans != 0) commit_orphans consolidate // traverse orphan list write_orph_nodes // rewrite all orphans by ubifs_leb_change // If dead orphans are not in list, they will be lost when powercut // happens, then TNC won't be updated by old orphans in next mounting. Luckily, the condition 'c->cmt_orphans != 0' will never be true in mounting process, there can't be new orphans added into orphan list before mounting returned, but commit will be done at the end of mounting. Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2024-07-12Revert "ubifs: ubifs_symlink: Fix memleak of inode->i_link in error path"Zhihao Cheng
This reverts commit 6379b44cdcd67f5f5d986b73953e99700591edfa. Commit 1e022216dcd2 ("ubifs: ubifs_symlink: Fix memleak of inode->i_link in error path") is applied again in commit 6379b44cdcd6 ("ubifs: ubifs_symlink: Fix memleak of inode->i_link in error path"), which changed ubifs_mknod (It won't become a real problem). Just revert it. Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2024-07-12ubifs: Don't add xattr inode into orphan areaZhihao Cheng
Now, the entire inode with its' xattrs are removed while replaying orphan nodes. There is no need to add xattr inodes into orphan area, which is based on the fact that xattr entries won't be cleared from disk before deleting xattr inodes, in another words, current logic can make sure that xattr inode be deleted in any cases even UBIFS not record xattr inode into orphan area. Let's looking for possible paths that could clear xattr entries from disk but leave the xattr inode on TNC: 1. unlink/tmpfile -> ubifs_jnl_update: inode(nlink=0) is written into bud LEB and added into orphan list, then: a. powercut: ubifs_tnc_remove_ino(xattr entry/inode can be found from TNC and being deleted) is invoked in replaying journal. b. commit + powercut: inode is written into orphan area, and ubifs_tnc_remove_ino is invoked in replaying orphan nodes. c. evicting + powercut: xattr inode(nlink=0) is written on disk, xattr is removed from TNC, gc could clear xattr entries from disk. ubifs_tnc_remove_ino will apply on inode and xattr inode in replaying journal, so lost xattr entries will make no influence. d. evicting + commit + powercut: xattr inode/entry are removed from index tree(on disk) by ubifs_jnl_write_inode, xattr inode is cleared from orphan area by ubifs_jnl_write_inode + commit. e. commit + evicting + powercut: inode is written into orphan area, then equivalent to c. 2. remove xattr -> ubifs_jnl_delete_xattr: xattr entry(inum=0) and xattr inode(nlink=0) is written into bud LEB, xattr entry/inode are removed from TNC, then: a. powercut: gc could clear xattr entries from disk, which won't affect deleting xattr entry from TNC. ubifs_tnc_remove_ino will apply on xattr inode in replaying journal, ubifs_tnc_remove_nm will apply on xattr entry in replaying journal. b. commit + powercut: xattr entry/inode are removed from index tree (on disk). Tracking xattr inode in orphan list is imported by commit 988bec41318f3f ("ubifs: orphan: Handle xattrs like files"), it aims to fix the similar problem described in commit 7959cf3a7506d4a ("ubifs: journal: Handle xattrs like files"). Actually, the problem only exist in journal case but not the orphan case. So, we can remove the orphan tracking for xattr inodes. Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2024-07-12ubifs: Fix unattached xattr inode if powercut happens after deletingZhihao Cheng
When powercut happens after deleting file, the xattr inode could be alone existing in TNC but its' xattr entry cannot be found in TNC. File inode and xattr inode are added into orphan list after deleting file, file inode's nlink is 0 but xattr inode's nlink is not 0 (PS: zero nlink xattr inode is written on disk in evicting process by ubifs_jnl_write_inode). So, following process could happen: 1. touch file 2. setxattr(file) 3. unlink file // inode(nlink=0), xattr inode(nlink=1) are added into orphan list 4. commit // write inode inum and xattr inum into orphan area 5. powercut 6. mount do_kill_orphans // inode(nlink=0) is deleted from TNC by ubifs_tnc_remove_range, // xattr entry is deleted too. // xattr inode(nlink=1) is not deleted from TNC Finally we could see following error while debugging UBIFS: UBIFS error (ubi0:0 pid 1093): dbg_check_filesystem [ubifs]: inode 66 nlink is 1, but calculated nlink is 0 UBIFS (ubi0:0): dump of the inode 66 sitting in LEB 12:2128 node_type 0 (inode node) group_type 1 (in node group) len 197 key (66, inode) size 37 nlink 1 flags 0x20 xattr_cnt 0 xattr_size 0 xattr_names 0 data len 37 Fix it by removing entire inode with it's xattrs while replaying orphan, just replace function ubifs_tnc_remove_range by ubifs_tnc_remove_ino. Fixes: ee1438ce5dc4 ("ubifs: Check link count of inodes when killing orphans.") Link: https://bugzilla.kernel.org/show_bug.cgi?id=218661 Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2024-07-12Merge tag 'for-6.10-rc7-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "Fix a regression in extent map shrinker behaviour. In the past weeks we got reports from users that there are huge latency spikes or freezes. This was bisected to newly added shrinker of extent maps (it was added to fix a build up of the structures in memory). I'm assuming that the freezes would happen to many users after release so I'd like to get it merged now so it's in 6.10. Although the diff size is not small the changes are relatively straightforward, the reporters verified the fixes and we did testing on our side. The fixes: - adjust behaviour under memory pressure and check lock or scheduling conditions, bail out if needed - synchronize tracking of the scanning progress so inode ranges are not skipped or work duplicated - do a delayed iput when scanning a root so evicting an inode does not slow things down in case of lots of dirty data, also fix lockdep warning, a deadlock could happen when writing the dirty data would need to start a transaction" * tag 'for-6.10-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: avoid races when tracking progress for extent map shrinking btrfs: stop extent map shrinker if reschedule is needed btrfs: use delayed iput during extent map shrinking
2024-07-12dt-bindings: incomplete-devices: document devices without bindingsKrzysztof Kozlowski
There are devices in the wild with non-updatable firmware coming with ACPI tables with rejected compatibles, e.g. "ltr,ltrf216a". Linux kernel still supports this device via ACPI PRP0001, however the compatible was never accepted to bindings. There are also several early PowerPC or SPARC platforms using compatibles for their OpenFirmware, but without in-tree DTS. Often the legacy compatible is not correct in terms of current Devicetree specification, e.g. missing vendor prefix. Finally there are also Linux-specific tools and test code with compatibles. Add a schema covering above cases purely to satisfy the DT schema and scripts/checkpatch.pl checks for undocumented compatibles. For ltr,ltrf216a this also documents the consensus: compatible is allowed only via ACPI PRP0001, but not bindings. Link: https://lore.kernel.org/all/20240705095047.90558-1-marex@denx.de/ Link: https://lore.kernel.org/lkml/20220731173446.7400bfa8@jic23-huawei/T/#me55be502302d70424a85368c2645c89f860b7b40 Cc: Marek Vasut <marex@denx.de> Cc: Jonathan Cameron <jic23@kernel.org> Cc: Sebastian Reichel <sebastian.reichel@collabora.com> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20240712121146.90942-1-krzysztof.kozlowski@linaro.org Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
2024-07-12mtd: ubi: avoid expensive do_div() on 32-bit machinesArnd Bergmann
The use of do_div() in ubi_nvmem_reg_read() makes calling it on 32-bit machines rather expensive. Since the 'from' variable is known to be a 32-bit quantity, it is clearly never needed and can be optimized into a regular division operation. Fixes: b8a77b9a5f9c ("mtd: ubi: fix NVMEM over UBI volumes on 32-bit systems") Fixes: 3ce485803da1 ("mtd: ubi: provide NVMEM layer over UBI volumes") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2024-07-12mtd: ubi: make ubi_class constantRicardo B. Marliere
Since commit 43a7206b0963 ("driver core: class: make class_register() take a const *"), the driver core allows for struct class to be in read-only memory, so move the ubi_class structure to be declared at build time placing it into read-only memory, instead of having to be dynamically allocated at boot time. Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net> Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com> Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2024-07-12ubi: eba: properly rollback inside self_check_ebaFedor Pchelkin
In case of a memory allocation failure in the volumes loop we can only process the already allocated scan_eba and fm_eba array elements on the error path - others are still uninitialized. Found by Linux Verification Center (linuxtesting.org). Fixes: 00abf3041590 ("UBI: Add self_check_eba()") Cc: stable@vger.kernel.org Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2024-07-12Merge tag 'ceph-for-6.10-rc8' of https://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph fixes from Ilya Dryomov: "A fix for a possible use-after-free following "rbd unmap" or "umount" marked for stable and two kernel-doc fixups" * tag 'ceph-for-6.10-rc8' of https://github.com/ceph/ceph-client: libceph: fix crush_choose_firstn() kernel-doc warnings libceph: suppress crush_choose_indep() kernel-doc warnings libceph: fix race between delayed_work() and ceph_monc_stop()
2024-07-12Merge tag 'pmdomain-v6.10-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm Pull pmdomain fix from Ulf Hansson: - qcom: Skip retention level for rpmhpd's * tag 'pmdomain-v6.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm: pmdomain: qcom: rpmhpd: Skip retention level for Power Domains
2024-07-12Merge tag 'mmc-v6.10-rc4-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc Pull MMC host fixes from Ulf Hansson: - davinci_mmc: Prevent transmitted data size from exceeding sgm's length - sdhci: Fix max_seg_size for 64KiB PAGE_SIZE * tag 'mmc-v6.10-rc4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: davinci_mmc: Prevent transmitted data size from exceeding sgm's length mmc: sdhci: Fix max_seg_size for 64KiB PAGE_SIZE
2024-07-12nfsd: nfsd_file_lease_notifier_call gets a file_lease as an argumentJeff Layton
"data" actually refers to a file_lease and not a file_lock. Both structs have their file_lock_core as the first field though, so this bug should be harmless without struct randomization in play. Reported-by: Florian Evers <florian-evers@gmx.de> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219008 Fixes: 05580bbfc6bc ("nfsd: adapt to breakup of struct file_lock") Signed-off-by: Jeff Layton <jlayton@kernel.org> Tested-by: Florian Evers <florian-evers@gmx.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-07-12perf trace: Fix iteration of syscall ids in syscalltbl->entriesHoward Chu
This is a bug found when implementing pretty-printing for the landlock_add_rule system call, I decided to send this patch separately because this is a serious bug that should be fixed fast. I wrote a test program to do landlock_add_rule syscall in a loop, yet perf trace -e landlock_add_rule freezes, giving no output. This bug is introduced by the false understanding of the variable "key" below: ``` for (key = 0; key < trace->sctbl->syscalls.nr_entries; ++key) { struct syscall *sc = trace__syscall_info(trace, NULL, key); ... } ``` The code above seems right at the beginning, but when looking at syscalltbl.c, I found these lines: ``` for (i = 0; i <= syscalltbl_native_max_id; ++i) if (syscalltbl_native[i]) ++nr_entries; entries = tbl->syscalls.entries = malloc(sizeof(struct syscall) * nr_entries); ... for (i = 0, j = 0; i <= syscalltbl_native_max_id; ++i) { if (syscalltbl_native[i]) { entries[j].name = syscalltbl_native[i]; entries[j].id = i; ++j; } } ``` meaning the key is merely an index to traverse the syscall table, instead of the actual syscall id for this particular syscall. So if one uses key to do trace__syscall_info(trace, NULL, key), because key only goes up to trace->sctbl->syscalls.nr_entries, for example, on my X86_64 machine, this number is 373, it will end up neglecting all the rest of the syscall, in my case, everything after `rseq`, because the traversal will stop at 373, and `rseq` is the last syscall whose id is lower than 373 in tools/perf/arch/x86/include/generated/asm/syscalls_64.c: ``` ... [334] = "rseq", [424] = "pidfd_send_signal", ... ``` The reason why the key is scrambled but perf trace works well is that key is used in trace__syscall_info(trace, NULL, key) to do trace->syscalls.table[id], this makes sure that the struct syscall returned actually has an id the same value as key, making the later bpf_prog matching all correct. After fixing this bug, I can do perf trace on 38 more syscalls, and because more syscalls are visible, we get 8 more syscalls that can be augmented. before: perf $ perf trace -vv --max-events=1 |& grep Reusing Reusing "open" BPF sys_enter augmenter for "stat" Reusing "open" BPF sys_enter augmenter for "lstat" Reusing "open" BPF sys_enter augmenter for "access" Reusing "connect" BPF sys_enter augmenter for "accept" Reusing "sendto" BPF sys_enter augmenter for "recvfrom" Reusing "connect" BPF sys_enter augmenter for "bind" Reusing "connect" BPF sys_enter augmenter for "getsockname" Reusing "connect" BPF sys_enter augmenter for "getpeername" Reusing "open" BPF sys_enter augmenter for "execve" Reusing "open" BPF sys_enter augmenter for "truncate" Reusing "open" BPF sys_enter augmenter for "chdir" Reusing "open" BPF sys_enter augmenter for "mkdir" Reusing "open" BPF sys_enter augmenter for "rmdir" Reusing "open" BPF sys_enter augmenter for "creat" Reusing "open" BPF sys_enter augmenter for "link" Reusing "open" BPF sys_enter augmenter for "unlink" Reusing "open" BPF sys_enter augmenter for "symlink" Reusing "open" BPF sys_enter augmenter for "readlink" Reusing "open" BPF sys_enter augmenter for "chmod" Reusing "open" BPF sys_enter augmenter for "chown" Reusing "open" BPF sys_enter augmenter for "lchown" Reusing "open" BPF sys_enter augmenter for "mknod" Reusing "open" BPF sys_enter augmenter for "statfs" Reusing "open" BPF sys_enter augmenter for "pivot_root" Reusing "open" BPF sys_enter augmenter for "chroot" Reusing "open" BPF sys_enter augmenter for "acct" Reusing "open" BPF sys_enter augmenter for "swapon" Reusing "open" BPF sys_enter augmenter for "swapoff" Reusing "open" BPF sys_enter augmenter for "delete_module" Reusing "open" BPF sys_enter augmenter for "setxattr" Reusing "open" BPF sys_enter augmenter for "lsetxattr" Reusing "openat" BPF sys_enter augmenter for "fsetxattr" Reusing "open" BPF sys_enter augmenter for "getxattr" Reusing "open" BPF sys_enter augmenter for "lgetxattr" Reusing "openat" BPF sys_enter augmenter for "fgetxattr" Reusing "open" BPF sys_enter augmenter for "listxattr" Reusing "open" BPF sys_enter augmenter for "llistxattr" Reusing "open" BPF sys_enter augmenter for "removexattr" Reusing "open" BPF sys_enter augmenter for "lremovexattr" Reusing "fsetxattr" BPF sys_enter augmenter for "fremovexattr" Reusing "open" BPF sys_enter augmenter for "mq_open" Reusing "open" BPF sys_enter augmenter for "mq_unlink" Reusing "fsetxattr" BPF sys_enter augmenter for "add_key" Reusing "fremovexattr" BPF sys_enter augmenter for "request_key" Reusing "fremovexattr" BPF sys_enter augmenter for "inotify_add_watch" Reusing "fremovexattr" BPF sys_enter augmenter for "mkdirat" Reusing "fremovexattr" BPF sys_enter augmenter for "mknodat" Reusing "fremovexattr" BPF sys_enter augmenter for "fchownat" Reusing "fremovexattr" BPF sys_enter augmenter for "futimesat" Reusing "fremovexattr" BPF sys_enter augmenter for "newfstatat" Reusing "fremovexattr" BPF sys_enter augmenter for "unlinkat" Reusing "fremovexattr" BPF sys_enter augmenter for "linkat" Reusing "open" BPF sys_enter augmenter for "symlinkat" Reusing "fremovexattr" BPF sys_enter augmenter for "readlinkat" Reusing "fremovexattr" BPF sys_enter augmenter for "fchmodat" Reusing "fremovexattr" BPF sys_enter augmenter for "faccessat" Reusing "fremovexattr" BPF sys_enter augmenter for "utimensat" Reusing "connect" BPF sys_enter augmenter for "accept4" Reusing "fremovexattr" BPF sys_enter augmenter for "name_to_handle_at" Reusing "fremovexattr" BPF sys_enter augmenter for "renameat2" Reusing "open" BPF sys_enter augmenter for "memfd_create" Reusing "fremovexattr" BPF sys_enter augmenter for "execveat" Reusing "fremovexattr" BPF sys_enter augmenter for "statx" after perf $ perf trace -vv --max-events=1 |& grep Reusing Reusing "open" BPF sys_enter augmenter for "stat" Reusing "open" BPF sys_enter augmenter for "lstat" Reusing "open" BPF sys_enter augmenter for "access" Reusing "connect" BPF sys_enter augmenter for "accept" Reusing "sendto" BPF sys_enter augmenter for "recvfrom" Reusing "connect" BPF sys_enter augmenter for "bind" Reusing "connect" BPF sys_enter augmenter for "getsockname" Reusing "connect" BPF sys_enter augmenter for "getpeername" Reusing "open" BPF sys_enter augmenter for "execve" Reusing "open" BPF sys_enter augmenter for "truncate" Reusing "open" BPF sys_enter augmenter for "chdir" Reusing "open" BPF sys_enter augmenter for "mkdir" Reusing "open" BPF sys_enter augmenter for "rmdir" Reusing "open" BPF sys_enter augmenter for "creat" Reusing "open" BPF sys_enter augmenter for "link" Reusing "open" BPF sys_enter augmenter for "unlink" Reusing "open" BPF sys_enter augmenter for "symlink" Reusing "open" BPF sys_enter augmenter for "readlink" Reusing "open" BPF sys_enter augmenter for "chmod" Reusing "open" BPF sys_enter augmenter for "chown" Reusing "open" BPF sys_enter augmenter for "lchown" Reusing "open" BPF sys_enter augmenter for "mknod" Reusing "open" BPF sys_enter augmenter for "statfs" Reusing "open" BPF sys_enter augmenter for "pivot_root" Reusing "open" BPF sys_enter augmenter for "chroot" Reusing "open" BPF sys_enter augmenter for "acct" Reusing "open" BPF sys_enter augmenter for "swapon" Reusing "open" BPF sys_enter augmenter for "swapoff" Reusing "open" BPF sys_enter augmenter for "delete_module" Reusing "open" BPF sys_enter augmenter for "setxattr" Reusing "open" BPF sys_enter augmenter for "lsetxattr" Reusing "openat" BPF sys_enter augmenter for "fsetxattr" Reusing "open" BPF sys_enter augmenter for "getxattr" Reusing "open" BPF sys_enter augmenter for "lgetxattr" Reusing "openat" BPF sys_enter augmenter for "fgetxattr" Reusing "open" BPF sys_enter augmenter for "listxattr" Reusing "open" BPF sys_enter augmenter for "llistxattr" Reusing "open" BPF sys_enter augmenter for "removexattr" Reusing "open" BPF sys_enter augmenter for "lremovexattr" Reusing "fsetxattr" BPF sys_enter augmenter for "fremovexattr" Reusing "open" BPF sys_enter augmenter for "mq_open" Reusing "open" BPF sys_enter augmenter for "mq_unlink" Reusing "fsetxattr" BPF sys_enter augmenter for "add_key" Reusing "fremovexattr" BPF sys_enter augmenter for "request_key" Reusing "fremovexattr" BPF sys_enter augmenter for "inotify_add_watch" Reusing "fremovexattr" BPF sys_enter augmenter for "mkdirat" Reusing "fremovexattr" BPF sys_enter augmenter for "mknodat" Reusing "fremovexattr" BPF sys_enter augmenter for "fchownat" Reusing "fremovexattr" BPF sys_enter augmenter for "futimesat" Reusing "fremovexattr" BPF sys_enter augmenter for "newfstatat" Reusing "fremovexattr" BPF sys_enter augmenter for "unlinkat" Reusing "fremovexattr" BPF sys_enter augmenter for "linkat" Reusing "open" BPF sys_enter augmenter for "symlinkat" Reusing "fremovexattr" BPF sys_enter augmenter for "readlinkat" Reusing "fremovexattr" BPF sys_enter augmenter for "fchmodat" Reusing "fremovexattr" BPF sys_enter augmenter for "faccessat" Reusing "fremovexattr" BPF sys_enter augmenter for "utimensat" Reusing "connect" BPF sys_enter augmenter for "accept4" Reusing "fremovexattr" BPF sys_enter augmenter for "name_to_handle_at" Reusing "fremovexattr" BPF sys_enter augmenter for "renameat2" Reusing "open" BPF sys_enter augmenter for "memfd_create" Reusing "fremovexattr" BPF sys_enter augmenter for "execveat" Reusing "fremovexattr" BPF sys_enter augmenter for "statx" TL;DR: These are the new syscalls that can be augmented Reusing "openat" BPF sys_enter augmenter for "open_tree" Reusing "openat" BPF sys_enter augmenter for "openat2" Reusing "openat" BPF sys_enter augmenter for "mount_setattr" Reusing "openat" BPF sys_enter augmenter for "move_mount" Reusing "open" BPF sys_enter augmenter for "fsopen" Reusing "openat" BPF sys_enter augmenter for "fspick" Reusing "openat" BPF sys_enter augmenter for "faccessat2" Reusing "openat" BPF sys_enter augmenter for "fchmodat2" as for the perf trace output: before perf $ perf trace -e faccessat2 --max-events=1 [no output] after perf $ ./perf trace -e faccessat2 --max-events=1 0.000 ( 0.037 ms): waybar/958 faccessat2(dfd: 40, filename: "uevent") = 0 P.S. The reason why this bug was not found in the past five years is probably because it only happens to the newer syscalls whose id is greater, for instance, faccessat2 of id 439, which not a lot of people care about when using perf trace. [Arnaldo]: notes That and the fact that the BPF code was hidden before having to use -e, that got changed kinda recently when we switched to using BPF skels for augmenting syscalls in 'perf trace': ⬢[acme@toolbox perf-tools-next]$ git log --oneline tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c a9f4c6c999008c92 perf trace: Collect sys_nanosleep first argument 29d16de26df17e94 perf augmented_raw_syscalls.bpf: Move 'struct timespec64' to vmlinux.h 5069211e2f0b47e7 perf trace: Use the right bpf_probe_read(_str) variant for reading user data 33b725ce7b988756 perf trace: Avoid compile error wrt redefining bool 7d9642311b6d9d31 perf bpf augmented_raw_syscalls: Add an assert to make sure sizeof(augmented_arg->value) is a power of two. 262b54b6c9396823 perf bpf augmented_raw_syscalls: Add an assert to make sure sizeof(saddr) is a power of two. 1836480429d173c0 perf bpf_skel augmented_raw_syscalls: Cap the socklen parameter using &= sizeof(saddr) cd2cece61ac5f900 perf trace: Tidy comments related to BPF + syscall augmentation 5e6da6be3082f77b perf trace: Migrate BPF augmentation to use a skeleton ⬢[acme@toolbox perf-tools-next]$ ⬢[acme@toolbox perf-tools-next]$ git show --oneline --pretty=reference 5e6da6be3082f77b | head -1 5e6da6be3082f77b (perf trace: Migrate BPF augmentation to use a skeleton, 2023-08-10) ⬢[acme@toolbox perf-tools-next]$ I.e. from August, 2023. One had as well to ask for BUILD_BPF_SKEL=1, which now is default if all it needs is available on the system. I simplified the code to not expose the 'struct syscall' outside of tools/perf/util/syscalltbl.c, instead providing a function to go from the index to the syscall id: int syscalltbl__id_at_idx(struct syscalltbl *tbl, int idx); Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/lkml/ZmhlAxbVcAKoPTg8@x1 Link: https://lore.kernel.org/r/20240705132059.853205-2-howardchu95@gmail.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-07-12cgroup: Add Michal Koutný as a maintainerTejun Heo
Michal has been contributing and reviewing patches across cgroup for a while now. Add him as a maintainer. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Michal Koutný <mkoutny@suse.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Zefan Li <lizefan.x@bytedance.com> Cc: Linus Torvalds <torvalds@linux-foundation.org>
2024-07-12cgroup/misc: Introduce misc.events.localXiu Jianfeng
Currently the event counting provided by misc.events is hierarchical, it's not practical if user is only concerned with events of a specified cgroup. Therefore, introduce misc.events.local collect events specific to the given cgroup. This is analogous to memory.events.local and pids.events.local. Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2024-07-12dm: introduce the target flag mempool_needs_integrityMikulas Patocka
This commit introduces the dm target flag mempool_needs_integrity. When the flag is set, device mapper will call bioset_integrity_create on it's bio sets. The target can then call bio_integrity_alloc on the bios allocated from the table's mempool. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2024-07-12perf dso: Fix address sanitizer buildIan Rogers
Various files had been missed from having accessor functions added for the sake of dso reference count checking. Add the function calls and missing dso accessor functions. Fixes: ee756ef7491e ("perf dso: Add reference count checking and accessor functions") Signed-off-by: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@linaro.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Yunseong Kim <yskelg@gmail.com> Cc: Will Deacon <will@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Leo Yan <leo.yan@linux.dev> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20240704011745.1021288-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-07-12perf mem: Warn if memory events are not supported on all CPUsLeo Yan
It is possible that memory events are not supported on all CPUs. Prints a warning by dumping the enabled CPU maps in this case. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: John Garry <john.g.garry@oracle.com> Cc: linux-arm-kernel@lists.infradead.org Cc: coresight@lists.linaro.org Link: https://lore.kernel.org/r/20240706152035.86983-3-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-07-12perf arm-spe: Support multiple Arm SPE PMUsLeo Yan
A platform can have more than one Arm SPE PMU. For example, a system with multiple clusters may have each cluster enabled with its own Arm SPE instance. In such case, the PMU devices will be named 'arm_spe_0', 'arm_spe_1', and so on. Currently, the tool only supports 'arm_spe_0'. This commit extends support to multiple Arm SPE PMUs by detecting the substring 'arm_spe_'. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: John Garry <john.g.garry@oracle.com> Cc: linux-arm-kernel@lists.infradead.org Cc: coresight@lists.linaro.org Link: https://lore.kernel.org/r/20240706152035.86983-2-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-07-12perf build x86: Fix SC2034 error in syscalltbl.shHaoze Xie
Change the unused var in 'arch/x86/entry/syscalls/syscalltbl.sh' to '_' when reading from '$sorted_table'. This change allows the script to pass tests of ShellCheck before and after version 0.7.2 at the same time. When building in arch x86, syscalltbl.sh got a ShellCheck warning, which makes compilation error: In arch/x86/entry/syscalls/syscalltbl.sh line 27: while read nr _abi name entry _compat; do ^-^ SC2034: abi appears unused. Verify use (or export if used externally). ^----^ SC2034: compat appears unused. Verify use (or export if used externally). The script reads unused param abi and compat. It uses format '_xxx' to indicate dummy vars, which won't work properly when ShellCheck <= 0.7.2. According to SC2034, the more general way of writing is to use directly '_' to indicate discarding vars. 'entry' is also replaced by '_' because it just happens to be defined in emit function, otherwise it will lead to some misunderstandings. Link: https://www.shellcheck.net/wiki/SC2034 Signed-off-by: Haoze Xie <royenheart@gmail.com> Signed-off-by: Yuan Tan <tanyuan@tinylab.org> Link: https://lore.kernel.org/r/2143cab4cd8468c88860f4e5e382d0e6b4d89ac9.1720372178.git.royenheart@gmail.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-07-12perf record: Fix memset out-of-range errorHaoze Xie
Modified the object of 'memset' from '&lost.lost' to '&lost' in record__read_lost_samples. This allows 'memset' to access memory properly without causing out-of-bounds problems. The problems got from builtin-record.c are: In file included from /usr/include/string.h:495, from util/parse-events.h:13, from builtin-record.c:14: In function 'memset', inlined from 'record__read_lost_samples' at builtin-record.c:1958:6, inlined from '__cmd_record.constprop' at builtin-record.c:2817:2: /usr/include/x86_64-linux-gnu/bits/string_fortified.h:71:10: error: '__builtin_memset' offset [17, 64] from the object at 'lost' is out of the bounds of referenced subobject 'lost' with type 'struct perf_record_lost_samples' at offset 0 [-Werror=array-bounds] 71|return __builtin___memset_chk (__dest,__ch,__len,__bos0 (__dest)); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The error arised when performing a memset operation on the 'lost' variable, the bytes of 'sizeof(lost)' exceeds that of '&lost.lost', which are 64 and 16. Fixes: 6c1785cd75ef ("perf record: Ensure space for lost samples") Signed-off-by: Haoze Xie <royenheart@gmail.com> Signed-off-by: Yuan Tan <tanyuan@tinylab.org> Link: https://lore.kernel.org/r/11e12f171b846577cac698cd3999db3d7f6c4d03.1720372317.git.royenheart@gmail.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-07-12perf sched map: Add --fuzzy-name option for fuzzy matching in task namesMadadi Vineeth Reddy
The --fuzzy-name option can be used if fuzzy name matching is required. For example, "taskname" can be matched to any string that contains "taskname" as its substring. Sample output for --task-name wdav --fuzzy-name ============= . *A0 . . . . - . 131040.641346 secs A0 => wdavdaemon:62509 . A0 *B0 . . . - . 131040.641378 secs B0 => wdavdaemon:62274 . *- B0 . . . - . 131040.641379 secs *C0 . B0 . . . . . 131040.641572 secs C0 => wdavdaemon:62283 C0 . B0 . *D0 . . . 131040.641572 secs D0 => wdavdaemon:62277 C0 . B0 . D0 . *E0 . 131040.641578 secs E0 => wdavdaemon:62270 *- . B0 . D0 . E0 . 131040.641581 secs Suggested-by: Chen Yu <yu.c.chen@intel.com> Reviewed-and-tested-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com> Link: https://lore.kernel.org/r/20240707182716.22054-4-vineethr@linux.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>