summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2024-10-29iommu: Introduce iommu_paging_domain_alloc_flags()Jason Gunthorpe
Currently drivers calls iommu_paging_domain_alloc(dev) to get an UNMANAGED domain. This is not sufficient to support PASID with UNMANAGED domain as some HW like AMD requires certain page table type to support PASIDs. Also the domain_alloc_paging op only passes device as param for domain allocation. This is not sufficient for AMD driver to decide the right page table. Instead of extending ops->domain_alloc_paging() it was decided to enhance ops->domain_alloc_user() so that caller can pass various additional flags. Hence add iommu_paging_domain_alloc_flags() API which takes flags as parameter. Caller can pass additional parameter to indicate type of domain required, etc. iommu_paging_domain_alloc_flags() internally calls appropriate callback function to allocate a domain. Signed-off-by: Jason Gunthorpe <jgg@ziepe.ca> [Added description - Vasant] Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20241028093810.5901-3-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-10-29iommu: Remove iommu_domain_alloc()Lu Baolu
The iommu_domain_alloc() interface is no longer used in the tree anymore. Remove it to avoid dead code. There is increasing demand for supporting multiple IOMMU drivers, and this is the last bus-based thing standing in the way of that. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20241009041147.28391-5-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-10-29iommu: Remove useless flush from iommu_create_device_direct_mappings()Jason Gunthorpe
These days iommu_map() does not require external flushing, it always internally handles any required flushes. Since iommu_create_device_direct_mappings() only calls iommu_map(), remove the extra call. Since this is the last call site for iommu_flush_iotlb_all() remove it too. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/0-v1-bb6c694e1b07+a29e1-iommu_no_flush_all_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-10-29Merge tag 'amd-drm-next-6.13-2024-10-25' of ↵Dave Airlie
https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.13-2024-10-25: amdgpu: - SDMA queue reset support - SMU 13.0.6 updates - Add debugfs interface to help limit jpeg queue scheduling for testing - JPEG 4.0.3 updates - Initial runtime repartitioning support - GFX9 fixes - Misc code cleanups - Rework IP structures to better handle multiple instances of an IP - DML updates - DSC fixes - HDR fixes - Brightness control updates - Runtime pm cleanup - DMCUB fixes - DCN 3.5 updates - Struct drm_edid cleanup - Fetch EDID from _DDC if available - Ring noop optimizations - MES logging fixes - 3DLUT fixes - DCN 4.x fixes - SMU 13.x fixes - Fixes for set_soft_freq_range() - ACPI fixes - SMU 14.x updates - PSR-SU fixes - fdinfo cleanup - DCN documentation updates amdkfd: - Misc code cleanups - Increase event FIFO size - Copy wave state fixes for SDMA radeon: - Fix possible overflow in packet3 check - Late init connector fix - Always set GEM function pointer Documentation: - Update drm-memory documentation From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241025132336.2416913-1-alexander.deucher@amd.com Signed-off-by: Dave Airlie <airlied@redhat.com>
2024-10-29dma-mapping: drop unneeded includes from dma-mapping.hChristoph Hellwig
Back in the day a lot of logic was implemented inline in dma-mapping.h and needed various includes. Move of this has long been moved out of line, so we can drop various includes to improve kernel rebuild times. Signed-off-by: Christoph Hellwig <hch@lst.de>
2024-10-29dma-mapping: trace more error pathsSean Anderson
It can be surprising to the user if DMA functions are only traced on success. On failure, it can be unclear what the source of the problem is. Fix this by tracing all functions even when they fail. Cases where we BUG/WARN are skipped, since those should be sufficiently noisy already. Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2024-10-29dma-mapping: use trace_dma_alloc for dma_alloc* instead of using trace_dma_mapSean Anderson
In some cases, we use trace_dma_map to trace dma_alloc* functions. This generally follows dma_debug. However, this does not record all of the relevant information for allocations, such as GFP flags. Create new dma_alloc tracepoints for these functions. Note that while dma_alloc_noncontiguous may allocate discontiguous pages (from the CPU's point of view), the device will only see one contiguous mapping. Therefore, we just need to trace dma_addr and size. Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2024-10-29dma-mapping: trace dma_alloc/free directionSean Anderson
In preparation for using these tracepoints in a few more places, trace the DMA direction as well. For coherent allocations this is always bidirectional. Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2024-10-29dma-mapping: use macros to define events in a classSean Anderson
Use a macro to avoid repeating the parameters and arguments for each event in a class. Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2024-10-29dma-mapping: remove an outdated comment from dma-map-ops.hSui Jingfeng
The "/* CONFIG_ARCH_HAS_DMA_COHERENCE_H */" was an description about the ARCH_HAS_DMA_COHERENCE_H configure option, but it has been removed since the dma_default_coherent variable was lifted from the mips architecture to the driver core. Therefore it doesn't match any compile guard now. Just remove it. Fixes: 6d4e9a8efe3d ("driver core: lift dma_default_coherent into common code") Signed-off-by: Sui Jingfeng <sui.jingfeng@linux.dev> Signed-off-by: Christoph Hellwig <hch@lst.de>
2024-10-29cpufreq: add virtual-cpufreq driverDavid Dai
Introduce a virtualized cpufreq driver for guest kernels to improve performance and power of workloads within VMs. This driver does two main things: 1. Sends the frequency of vCPUs as a hint to the host. The host uses the hint to schedule the vCPU threads and decide physical CPU frequency. 2. If a VM does not support a virtualized FIE(like AMUs), it queries the host CPU frequency by reading a MMIO region of a virtual cpufreq device to update the guest's frequency scaling factor periodically. This enables accurate Per-Entity Load Tracking for tasks running in the guest. Co-developed-by: Saravana Kannan <saravanak@google.com> Signed-off-by: Saravana Kannan <saravanak@google.com> Signed-off-by: David Dai <davidai@google.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2024-10-28fork: only invoke khugepaged, ksm hooks if no errorLorenzo Stoakes
There is no reason to invoke these hooks early against an mm that is in an incomplete state. The change in commit d24062914837 ("fork: use __mt_dup() to duplicate maple tree in dup_mmap()") makes this more pertinent as we may be in a state where entries in the maple tree are not yet consistent. Their placement early in dup_mmap() only appears to have been meaningful for early error checking, and since functionally it'd require a very small allocation to fail (in practice 'too small to fail') that'd only occur in the most dire circumstances, meaning the fork would fail or be OOM'd in any case. Since both khugepaged and KSM tracking are there to provide optimisations to memory performance rather than critical functionality, it doesn't really matter all that much if, under such dire memory pressure, we fail to register an mm with these. As a result, we follow the example of commit d2081b2bf819 ("mm: khugepaged: make khugepaged_enter() void function") and make ksm_fork() a void function also. We only expose the mm to these functions once we are done with them and only if no error occurred in the fork operation. Link: https://lkml.kernel.org/r/e0cb8b840c9d1d5a6e84d4f8eff5f3f2022aa10c.1729014377.git.lorenzo.stoakes@oracle.com Fixes: d24062914837 ("fork: use __mt_dup() to duplicate maple tree in dup_mmap()") Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reported-by: Jann Horn <jannh@google.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Jann Horn <jannh@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Linus Torvalds <torvalds@linuxfoundation.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-10-28fork: do not invoke uffd on fork if error occursLorenzo Stoakes
Patch series "fork: do not expose incomplete mm on fork". During fork we may place the virtual memory address space into an inconsistent state before the fork operation is complete. In addition, we may encounter an error during the fork operation that indicates that the virtual memory address space is invalidated. As a result, we should not be exposing it in any way to external machinery that might interact with the mm or VMAs, machinery that is not designed to deal with incomplete state. We specifically update the fork logic to defer khugepaged and ksm to the end of the operation and only to be invoked if no error arose, and disallow uffd from observing fork events should an error have occurred. This patch (of 2): Currently on fork we expose the virtual address space of a process to userland unconditionally if uffd is registered in VMAs, regardless of whether an error arose in the fork. This is performed in dup_userfaultfd_complete() which is invoked unconditionally, and performs two duties - invoking registered handlers for the UFFD_EVENT_FORK event via dup_fctx(), and clearing down userfaultfd_fork_ctx objects established in dup_userfaultfd(). This is problematic, because the virtual address space may not yet be correctly initialised if an error arose. The change in commit d24062914837 ("fork: use __mt_dup() to duplicate maple tree in dup_mmap()") makes this more pertinent as we may be in a state where entries in the maple tree are not yet consistent. We address this by, on fork error, ensuring that we roll back state that we would otherwise expect to clean up through the event being handled by userland and perform the memory freeing duty otherwise performed by dup_userfaultfd_complete(). We do this by implementing a new function, dup_userfaultfd_fail(), which performs the same loop, only decrementing reference counts. Note that we perform mmgrab() on the parent and child mm's, however userfaultfd_ctx_put() will mmdrop() this once the reference count drops to zero, so we will avoid memory leaks correctly here. Link: https://lkml.kernel.org/r/cover.1729014377.git.lorenzo.stoakes@oracle.com Link: https://lkml.kernel.org/r/d3691d58bb58712b6fb3df2be441d175bd3cdf07.1729014377.git.lorenzo.stoakes@oracle.com Fixes: d24062914837 ("fork: use __mt_dup() to duplicate maple tree in dup_mmap()") Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reported-by: Jann Horn <jannh@google.com> Reviewed-by: Jann Horn <jannh@google.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Linus Torvalds <torvalds@linuxfoundation.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-10-29usb: storage: fix wrong comments for struct bulk_cb_wrapDingyan Li
In the flags, direction is in bit 7 instead of bit 0 based on the specification. Signed-off-by: Dingyan Li <18500469033@163.com> Link: https://lore.kernel.org/r/20241020074721.26905-1-18500469033@163.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-28Merge branch 'cxl/for-6.12/printf' into cxl-for-nextDave Jiang
Add support for adding a printf specifier '$pra' to emit 'struct range' content.
2024-10-28New implementation for IO memcpy and IO memsetJulian Vetter
The IO memcpy and IO memset functions in asm-generic/io.h simply call memcpy and memset. This can lead to alignment problems or faults on architectures that do not define their own version and fall back to these defaults. This patch introduces new implementations for IO memcpy and IO memset, that use read{l,q} accessor functions, align accesses to machine word size, and resort to byte accesses when the target memory is not aligned. For new architectures and existing ones that were using the old fallbacks these functions are save to use, because IO memory constraints are taken into account. Moreover, architectures with similar implementations can now use these new versions, not needing to implement their own. Reviewed-by: Yann Sionneau <ysionneau@kalrayinc.com> Signed-off-by: Julian Vetter <jvetter@kalrayinc.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-10-28__arch_xprod64(): make __always_inline when optimizing for performanceNicolas Pitre
Recent gcc versions started not systematically inline __arch_xprod64() and that has performance implications. Give the compiler the freedom to decide only when optimizing for size. Here's some timing numbers from lib/math/test_div64.c Using __always_inline: ``` test_div64: Starting 64bit/32bit division and modulo test test_div64: Completed 64bit/32bit division and modulo test, 0.048285584s elapsed ``` Without __always_inline: ``` test_div64: Starting 64bit/32bit division and modulo test test_div64: Completed 64bit/32bit division and modulo test, 0.053023584s elapsed ``` Forcing constant base through the non-constant base code path: ``` test_div64: Starting 64bit/32bit division and modulo test test_div64: Completed 64bit/32bit division and modulo test, 0.103263776s elapsed ``` It is worth noting that test_div64 does half the test with non constant divisors already so the impact is greater than what those numbers show. And for what it is worth, those numbers were obtained using QEMU. The gcc version is 14.1.0. Signed-off-by: Nicolas Pitre <npitre@baylibre.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-10-28asm-generic/div64: optimize/simplify __div64_const32()Nicolas Pitre
Several years later I just realized that this code could be greatly simplified. First, let's formalize the need for overflow handling in __arch_xprod64(). Assuming n = UINT64_MAX, there are 2 cases where an overflow may occur: 1) If a bias must be added, we have m_lo * n_lo + m or m_lo * 0xffffffff + ((m_hi << 32) + m_lo) or ((m_lo << 32) - m_lo) + ((m_hi << 32) + m_lo) or (m_lo + m_hi) << 32 which must be < (1 << 64). So the criteria for no overflow is m_lo + m_hi < (1 << 32). 2) The cross product m_lo * n_hi + m_hi * n_lo or m_lo * 0xffffffff + m_hi * 0xffffffff or ((m_lo << 32) - m_lo) + ((m_hi << 32) - m_hi). Assuming the top result from the previous step (m_lo + m_hi) that must be added to this, we get (m_lo + m_hi) << 32 again. So let's have a straight and simpler version when this is true. Otherwise some reordering allows for taking care of possible overflows without any actual conditionals. And prevent from generating both code variants by making sure this is considered only if m is perceived as constant by the compiler. This, in turn, allows for greatly simplifying __div64_const32(). The "special case" may go as well as the regular case works just fine without needing a bias. Then reduction should be applied all the time as minimizing m is the key. Signed-off-by: Nicolas Pitre <npitre@baylibre.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-10-28asm-generic: add an optional pfn_valid check to page_to_physChristoph Hellwig
page_to_pfn is usually implemented by pointer arithmetics on the memory map, which means that bogus input can lead to even more bogus output. Powerpc had a pfn_valid check on the intermediate pfn in the page_to_phys implementation when CONFIG_DEBUG_VIRTUAL is defined, which seems generally useful, so add that to the generic version. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-10-28asm-generic: provide generic page_to_phys and phys_to_page implementationsChristoph Hellwig
page_to_phys is duplicated by all architectures, and from some strange reason placed in <asm/io.h> where it doesn't fit at all. phys_to_page is only provided by a few architectures despite having a lot of open coded users. Provide generic versions in <asm-generic/memory_model.h> to make these helpers more easily usable. Note with this patch powerpc loses the CONFIG_DEBUG_VIRTUAL pfn_valid check. It will be added back in a generic version later. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-10-28asm-generic/io.h: Remove I/O port accessors for HAS_IOPORT=nNiklas Schnelle
With all subsystems and drivers either declaring their dependence on HAS_IOPORT or fencing I/O port specific code sections we can finally make inb()/outb() and friends compile-time dependent on HAS_IOPORT as suggested by Linus in the linked mail. The main benefit of this is that on platforms such as s390 which have no meaningful way of implementing inb()/outb() their use without the proper HAS_IOPORT dependency will result in easy to catch and fix compile-time errors instead of compiling code that can never work. Link: https://lore.kernel.org/lkml/CAHk-=wg80je=K7madF4e7WrRNp37e3qh6y10Svhdc7O8SZ_-8g@mail.gmail.com/ Co-developed-by: Arnd Bergmann <arnd@kernel.org> Signed-off-by: Arnd Bergmann <arnd@kernel.org> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Takashi Iwai <tiwai@suse.de> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Acked-by: Lucas De Marchi <lucas.demarchi@intel.com> Acked-by: Maciej W. Rozycki <macro@orcam.me.uk> Acked-by: Damien Le Moal <dlemoal@kernel.org> Acked-by: Jaroslav Kysela <perex@perex.cz> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> # for ARCH=um Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Kalle Valo <kvalo@kernel.org> Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Corey Minyard <cminyard@mvista.com> Acked-by: Guenter Roeck <linux@roeck-us.net> Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by: Sebastian Reichel <sre@kernel.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-10-28tty: serial: handle HAS_IOPORT dependenciesNiklas Schnelle
In a future patch HAS_IOPORT=n will disable inb()/outb() and friends at compile time. We thus need to add HAS_IOPORT as dependency for those drivers using them unconditionally. Some 8250 serial drivers support MMIO only use, so fence only the parts requiring I/O ports and print an error message if a device can't be supported with the current configuration. Co-developed-by: Arnd Bergmann <arnd@kernel.org> Signed-off-by: Arnd Bergmann <arnd@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Maciej W. Rozycki <macro@orcam.me.uk> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-10-28UAPI/ioctl: Improve parameter name of ioctl request definition helpersUwe Kleine-König
The third parameter to _IOR et al is a type name, not a size. So the parameter being named "size" is irritating. Rename it to "argtype" instead to reduce confusion. There is a very minor chance that this breaks stuff. It only hurts however if there is a variable (or macro) in userspace that is called "argtype" *and* it's used in the parameters of _IOR and friends. IMHO this is negligible because usually definitions making use of these macros are provided by kernel headers (i.e. us) or if they are replicated in userspace code, they are replicated and so supposed to match the kernel definitions (e.g. to make them usable by programs without the need to update the kernel headers used to compile the program). Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-10-28printf: Add print format (%pra) for struct rangeIra Weiny
The use of struct range in the CXL subsystem is growing. In particular, the addition of Dynamic Capacity devices uses struct range in a number of places which are reported in debug and error messages. To wit requiring the printing of the start/end fields in each print became cumbersome. Dan Williams mentions in [1] that it might be time to have a print specifier for struct range similar to struct resource. A few alternatives were considered including '%par', '%r', and '%pn'. %pra follows that struct range is similar to struct resource (%p[rR]) but needs to be different. Based on discussions with Petr and Andy '%pra' was chosen.[2] Andy also suggested to keep the range prints similar to struct resource though combined code. Add hex_range() to handle printing for both pointer types. Finally introduce DEFINE_RANGE() as a parallel to DEFINE_RES_*() and use it in the tests. Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Cc: open list <linux-kernel@vger.kernel.org> Link: https://lore.kernel.org/all/663922b475e50_d54d72945b@dwillia2-xfh.jf.intel.com.notmuch/ [1] Link: https://lore.kernel.org/all/66cea3bf3332f_f937b29424@iweiny-mobl.notmuch/ [2] Suggested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://patch.msgid.link/20241025-cxl-pra-v2-3-123a825daba2@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-10-28drm/amdkfd: flag per-queue reset support for gfx9Jonathan Kim
Flag KFD support for per-queue reset on GFX9 devices. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Harish Kasiviswanathan <harish.kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-28iio: acpi: Add iio_get_acpi_device_name_and_data() helper functionAndy Shevchenko
A few drivers duplicate the code to retrieve ACPI device instance name. Some of them want an associated driver data as well. In order of deduplication introduce the common helper functions. Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://patch.msgid.link/20241024191200.229894-6-andriy.shevchenko@linux.intel.com Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-10-28perf/marvell: Marvell PEM performance monitor supportGowthami Thiagarajan
PCI Express Interface PMU includes various performance counters to monitor the data that is transmitted over the PCIe link. The counters track various inbound and outbound transactions which includes separate counters for posted/non-posted/completion TLPs. Also, inbound and outbound memory read requests along with their latencies can also be monitored. Address Translation Services(ATS)events such as ATS Translation, ATS Page Request, ATS Invalidation along with their corresponding latencies are also supported. The performance counters are 64 bits wide. For instance, perf stat -e ib_tlp_pr <workload> tracks the inbound posted TLPs for the workload. Co-developed-by: Linu Cherian <lcherian@marvell.com> Signed-off-by: Linu Cherian <lcherian@marvell.com> Signed-off-by: Gowthami Thiagarajan <gthiagarajan@marvell.com> Link: https://lore.kernel.org/r/20241028055309.17893-1-gthiagarajan@marvell.com Signed-off-by: Will Deacon <will@kernel.org>
2024-10-28perf/arm_pmuv3: Add PMUv3.9 per counter EL0 access controlRob Herring (Arm)
Armv8.9/9.4 PMUv3.9 adds per counter EL0 access controls. Per counter access is enabled with the UEN bit in PMUSERENR_EL1 register. Individual counters are enabled/disabled in the PMUACR_EL1 register. When UEN is set, the CR/ER bits control EL0 write access and must be set to disable write access. With the access controls, the clearing of unused counters can be skipped. KVM also configures PMUSERENR_EL1 in order to trap to EL2. UEN does not need to be set for it since only PMUv3.5 is exposed to guests. Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Link: https://lore.kernel.org/r/20241002184326.1105499-1-robh@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2024-10-28kernel/range: Const-ify range_contains parametersIra Weiny
range_contains() does not modify the range values. David suggested it is safer to keep those parameters as const.[1] Make range parameters const Link: https://lore.kernel.org/all/20241008161032.GB1609@twin.jikos.cz/ [1] Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: David Sterba <dsterba@suse.com> Link: https://patch.msgid.link/20241010-const-range-v1-1-afb6e4bfd8ce@intel.com Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-10-28iommufd: Add IOMMU_IOAS_MAP_FILESteve Sistare
Define the IOMMU_IOAS_MAP_FILE ioctl interface, which allows a user to register memory by passing a memfd plus offset and length. Implement it using the memfd_pin_folios() kAPI. Link: https://patch.msgid.link/r/1729861919-234514-8-git-send-email-steven.sistare@oracle.com Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-10-28mm/gup: Add folio_add_pins()Steve Sistare
Export a function that adds pins to an already-pinned huge-page folio. This allows any range of small pages within the folio to be unpinned later. For example, pages pinned via memfd_pin_folios and modified by folio_add_pins could be unpinned via unpin_user_page(s). Link: https://patch.msgid.link/r/1729861919-234514-2-git-send-email-steven.sistare@oracle.com Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Suggested-by: David Hildenbrand <david@redhat.com> Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-10-28vdso: Change PAGE_MASK to signed on all 32-bit architecturesArnd Bergmann
With the introduction of an architecture-independent defintion of PAGE_MASK, we had to make a choice between defining it as 'unsigned long' as on 64-bit architectures, or as signed 'long' as required for architectures with a 64-bit phys_addr_t. To reduce the risk for regressions and minimize the changes in behavior, the result was using the signed value only when CONFIG_PHYS_ADDR_T_64BIT is set, but that ended up causing a regression after all in the early_init_dt_add_memory_arch() function that uses 64-bit integers for address calculation. Presumably the same regression also affects mips32 and powerpc32 when dealing with large amounts of memory on DT platforms: like arm32, they were using the signed version unconditionally. The two most sensible options for addressing the regression are either to go back to an architecture specific definition, using a signed constant on arm/powerpc/mips and unsigned on the others, or to use the same definition everywhere. Use the simpler of those two and change them all to the signed version, in the hope that this does not cause a different type of bug. Most of the other 32-bit architectures have no large physical address support and are rarely used, so it seems more likely that using the same definition helps than hurts here. In particular, x86-32 does have physical addressing extensions, so it already changed to the signed version after the previous patch, so it makes sense to use the same version on non-PAE as well. Fixes: efe8419ae78d ("vdso: Introduce vdso/page.h") Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Anders Roxell <anders.roxell@linaro.org> Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Link: https://lore.kernel.org/all/20241024133447.3117273-1-arnd@kernel.org Link: https://lore.kernel.org/lkml/CA+G9fYt86bUAu_v5dXPWnDUwQNVipj+Wq3Djir1KUSKdr9QLNg@mail.gmail.com/
2024-10-28media: v4l2-core: v4l2-dv-timings: check cvt/gtf resultHans Verkuil
The v4l2_detect_cvt/gtf functions should check the result against the timing capabilities: these functions calculate the timings, so if they are out of bounds, they should be rejected. To do this, add the struct v4l2_dv_timings_cap as argument to those functions. This required updates to the adv7604 and adv7842 drivers since the prototype of these functions has now changed. The timings struct that is passed to v4l2_detect_cvt/gtf in those two drivers is filled with the timings detected by the hardware. The vivid driver was also updated, but an additional check was added: the width and height specified by VIDIOC_S_DV_TIMINGS has to match the calculated result, otherwise something went wrong. Note that vivid *emulates* hardware, so all the values passed to the v4l2_detect_cvt/gtf functions came from the timings struct that was filled by userspace and passed on to the driver via VIDIOC_S_DV_TIMINGS. So these fields can contain random data. Both the constraints check via struct v4l2_dv_timings_cap and the additional width/height check ensure that the resulting timings are sane and not messed up by the v4l2_detect_cvt/gtf calculations. Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Fixes: 2576415846bc ("[media] v4l2: move dv-timings related code to v4l2-dv-timings.c") Cc: stable@vger.kernel.org Reported-by: syzbot+a828133770f62293563e@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-media/000000000000013050062127830a@google.com/
2024-10-28tmpfs: Add flag FS_CASEFOLD_FL support for tmpfs dirsAndré Almeida
Enable setting flag FS_CASEFOLD_FL for tmpfs directories, when tmpfs is mounted with casefold support. A special check is need for this flag, since it can't be set for non-empty directories. Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de> Reviewed-by: Gabriel Krisman Bertazi <gabriel@krisman.be> Signed-off-by: André Almeida <andrealmeid@igalia.com> Link: https://lore.kernel.org/r/20241021-tonyk-tmpfs-v8-7-f443d5814194@igalia.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-28libfs: Export generic_ci_ dentry functionsAndré Almeida
Export generic_ci_ dentry functions so they can be used by case-insensitive filesystems that need something more custom than the default one set by `struct generic_ci_dentry_ops`. Reviewed-by: Gabriel Krisman Bertazi <gabriel@krisman.be> Signed-off-by: André Almeida <andrealmeid@igalia.com> Link: https://lore.kernel.org/r/20241021-tonyk-tmpfs-v8-5-f443d5814194@igalia.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-28unicode: Recreate utf8_parse_version()André Almeida
All filesystems that currently support UTF-8 casefold can fetch the UTF-8 version from the filesystem metadata stored on disk. They can get the data stored and directly match it to a integer, so they can skip the string parsing step, which motivated the removal of this function in the first place. However, for tmpfs, the only way to tell the kernel which UTF-8 version we are about to use is via mount options, using a string. Re-introduce utf8_parse_version() to be used by tmpfs. This version differs from the original by skipping the intermediate step of copying the version string to an auxiliary string before calling match_token(). This versions calls match_token() in the argument string. The paramenters are simpler now as well. utf8_parse_version() was created by 9d53690f0d4 ("unicode: implement higher level API for string handling") and later removed by 49bd03cc7e9 ("unicode: pass a UNICODE_AGE() tripple to utf8_load"). Signed-off-by: André Almeida <andrealmeid@igalia.com> Link: https://lore.kernel.org/r/20241021-tonyk-tmpfs-v8-4-f443d5814194@igalia.com Reviewed-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-28unicode: Export latest available UTF-8 version numberAndré Almeida
Export latest available UTF-8 version number so filesystems can easily load the newest one. Signed-off-by: André Almeida <andrealmeid@igalia.com> Link: https://lore.kernel.org/r/20241021-tonyk-tmpfs-v8-3-f443d5814194@igalia.com Acked-by: Gabriel Krisman Bertazi <krisman@suse.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-28libfs: Create the helper function generic_ci_validate_strict_name()André Almeida
Create a helper function for filesystems do the checks required for casefold directories and strict encoding. Suggested-by: Gabriel Krisman Bertazi <krisman@suse.de> Reviewed-by: Gabriel Krisman Bertazi <gabriel@krisman.be> Signed-off-by: André Almeida <andrealmeid@igalia.com> Link: https://lore.kernel.org/r/20241021-tonyk-tmpfs-v8-1-f443d5814194@igalia.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-28fs/writeback: convert wbc_account_cgroup_owner to take a folioPankaj Raghav
Most of the callers of wbc_account_cgroup_owner() are converting a folio to page before calling the function. wbc_account_cgroup_owner() is converting the page back to a folio to call mem_cgroup_css_from_folio(). Convert wbc_account_cgroup_owner() to take a folio instead of a page, and convert all callers to pass a folio directly except f2fs. Convert the page to folio for all the callers from f2fs as they were the only callers calling wbc_account_cgroup_owner() with a page. As f2fs is already in the process of converting to folios, these call sites might also soon be calling wbc_account_cgroup_owner() with a folio directly in the future. No functional changes. Only compile tested. Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> Link: https://lore.kernel.org/r/20240926140121.203821-1-kernel@pankajraghav.com Acked-by: David Sterba <dsterba@suse.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-28ASoC: soc-devres: Remove unused devm_snd_soc_register_daiDr. David Alan Gilbert
The last use of devm_snd_soc_register_dai() was removed by commit fc4cb1e15f0c ("ASoC: topology: Properly unregister DAI on removal") in 2021. Remove it, and the helper it used. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Link: https://patch.msgid.link/20241028021226.477909-1-linux@treblig.org Signed-off-by: Mark Brown <broonie@kernel.org>
2024-10-27posix-cpu-timers: Clear TICK_DEP_BIT_POSIX_TIMER on cloneBenjamin Segall
When cloning a new thread, its posix_cputimers are not inherited, and are cleared by posix_cputimers_init(). However, this does not clear the tick dependency it creates in tsk->tick_dep_mask, and the handler does not reach the code to clear the dependency if there were no timers to begin with. Thus if a thread has a cputimer running before clone/fork, all descendants will prevent nohz_full unless they create a cputimer of their own. Fix this by entirely clearing the tick_dep_mask in copy_process(). (There is currently no inherited state that needs a tick dependency) Process-wide timers do not have this problem because fork does not copy signal_struct as a baseline, it creates one from scratch. Fixes: b78783000d5c ("posix-cpu-timers: Migrate to use new tick dependency mask model") Signed-off-by: Ben Segall <bsegall@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/xm26o737bq8o.fsf@google.com
2024-10-26block: model freeze & enter queue as lock for supporting lockdepMing Lei
Recently we got several deadlock report[1][2][3] caused by blk_mq_freeze_queue and blk_enter_queue(). Turns out the two are just like acquiring read/write lock, so model them as read/write lock for supporting lockdep: 1) model q->q_usage_counter as two locks(io and queue lock) - queue lock covers sync with blk_enter_queue() - io lock covers sync with bio_enter_queue() 2) make the lockdep class/key as per-queue: - different subsystem has very different lock use pattern, shared lock class causes false positive easily - freeze_queue degrades to no lock in case that disk state becomes DEAD because bio_enter_queue() won't be blocked any more - freeze_queue degrades to no lock in case that request queue becomes dying because blk_enter_queue() won't be blocked any more 3) model blk_mq_freeze_queue() as acquire_exclusive & try_lock - it is exclusive lock, so dependency with blk_enter_queue() is covered - it is trylock because blk_mq_freeze_queue() are allowed to run concurrently 4) model blk_enter_queue() & bio_enter_queue() as acquire_read() - nested blk_enter_queue() are allowed - dependency with blk_mq_freeze_queue() is covered - blk_queue_exit() is often called from other contexts(such as irq), and it can't be annotated as lock_release(), so simply do it in blk_enter_queue(), this way still covered cases as many as possible With lockdep support, such kind of reports may be reported asap and needn't wait until the real deadlock is triggered. For example, lockdep report can be triggered in the report[3] with this patch applied. [1] occasional block layer hang when setting 'echo noop > /sys/block/sda/queue/scheduler' https://bugzilla.kernel.org/show_bug.cgi?id=219166 [2] del_gendisk() vs blk_queue_enter() race condition https://lore.kernel.org/linux-block/20241003085610.GK11458@google.com/ [3] queue_freeze & queue_enter deadlock in scsi https://lore.kernel.org/linux-block/ZxG38G9BuFdBpBHZ@fedora/T/#u Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20241025003722.3630252-4-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-26blk-mq: add non_owner variant of start_freeze/unfreeze queue APIsMing Lei
Add non_owner variant of start_freeze/unfreeze queue APIs, so that the caller knows that what they are doing, and we can skip lockdep support for non_owner variant in per-call level. Prepare for supporting lockdep for freezing/unfreezing queue. Reviewed-by: Christoph Hellwig <hch@lst.de> Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20241025003722.3630252-2-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-26Merge branch 'for-v6.13/clk-dt-bindings' into next/clkKrzysztof Kozlowski
2024-10-26dt-bindings: clock: samsung: Add Exynos8895 SoCIvaylo Ivanov
Provide dt-schema documentation for Samsung Exynos8895 SoC clock controller CMU blocks: - CMU_FSYS0/1 - CMU_PERIC0/1 - CMU_PERIS - CMU_TOP Signed-off-by: Ivaylo Ivanov <ivo.ivanov.ivanov1@gmail.com> Link: https://lore.kernel.org/r/20241023090136.537395-2-ivo.ivanov.ivanov1@gmail.com Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
2024-10-25cxl/port: Fix use-after-free, permit out-of-order decoder shutdownDan Williams
In support of investigating an initialization failure report [1], cxl_test was updated to register mock memory-devices after the mock root-port/bus device had been registered. That led to cxl_test crashing with a use-after-free bug with the following signature: cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem0:decoder7.0 @ 0 next: cxl_switch_uport.0 nr_eps: 1 nr_targets: 1 cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem4:decoder14.0 @ 1 next: cxl_switch_uport.0 nr_eps: 2 nr_targets: 1 cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[0] = cxl_switch_dport.0 for mem0:decoder7.0 @ 0 1) cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[1] = cxl_switch_dport.4 for mem4:decoder14.0 @ 1 [..] cxld_unregister: cxl decoder14.0: cxl_region_decode_reset: cxl_region region3: mock_decoder_reset: cxl_port port3: decoder3.0 reset 2) mock_decoder_reset: cxl_port port3: decoder3.0: out of order reset, expected decoder3.1 cxl_endpoint_decoder_release: cxl decoder14.0: [..] cxld_unregister: cxl decoder7.0: 3) cxl_region_decode_reset: cxl_region region3: Oops: general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6bc3: 0000 [#1] PREEMPT SMP PTI [..] RIP: 0010:to_cxl_port+0x8/0x60 [cxl_core] [..] Call Trace: <TASK> cxl_region_decode_reset+0x69/0x190 [cxl_core] cxl_region_detach+0xe8/0x210 [cxl_core] cxl_decoder_kill_region+0x27/0x40 [cxl_core] cxld_unregister+0x5d/0x60 [cxl_core] At 1) a region has been established with 2 endpoint decoders (7.0 and 14.0). Those endpoints share a common switch-decoder in the topology (3.0). At teardown, 2), decoder14.0 is the first to be removed and hits the "out of order reset case" in the switch decoder. The effect though is that region3 cleanup is aborted leaving it in-tact and referencing decoder14.0. At 3) the second attempt to teardown region3 trips over the stale decoder14.0 object which has long since been deleted. The fix here is to recognize that the CXL specification places no mandate on in-order shutdown of switch-decoders, the driver enforces in-order allocation, and hardware enforces in-order commit. So, rather than fail and leave objects dangling, always remove them. In support of making cxl_region_decode_reset() always succeed, cxl_region_invalidate_memregion() failures are turned into warnings. Crashing the kernel is ok there since system integrity is at risk if caches cannot be managed around physical address mutation events like CXL region destruction. A new device_for_each_child_reverse_from() is added to cleanup port->commit_end after all dependent decoders have been disabled. In other words if decoders are allocated 0->1->2 and disabled 1->2->0 then port->commit_end only decrements from 2 after 2 has been disabled, and it decrements all the way to zero since 1 was disabled previously. Link: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net [1] Cc: stable@vger.kernel.org Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware") Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Alison Schofield <alison.schofield@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Zijun Hu <quic_zijuhu@quicinc.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://patch.msgid.link/172964782781.81806.17902885593105284330.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Ira Weiny <ira.weiny@intel.com>
2024-10-25scsi: ufs: core: Simplify ufshcd_err_handling_prepare()Bart Van Assche
Use blk_mq_quiesce_tagset() instead of ufshcd_scsi_block_requests() and blk_mq_wait_quiesce_done(). Since this patch removes the last callers of ufshcd_scsi_block_requests() and ufshcd_scsi_unblock_requests(), remove these functions. Reviewed-by: Avri Altman <avri.altman@wdc.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20241022193130.2733293-6-bvanassche@acm.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-10-25scsi: ufs: core: Move the ufshcd_mcq_enable_esi() definitionBart Van Assche
Move the ufshcd_mcq_enable_esi() definition such that it occurs immediately before the ufshcd_mcq_config_esi() definition. Reviewed-by: Peter Wang <peter.wang@mediatek.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20241022193130.2733293-2-bvanassche@acm.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-10-25scsi: ufs: core: Make DMA mask configuration more flexibleBart Van Assche
Replace UFSHCD_QUIRK_BROKEN_64BIT_ADDRESS with ufs_hba_variant_ops::set_dma_mask. Update the Renesas driver accordingly. This patch enables supporting other configurations than 32-bit or 64-bit DMA addresses, e.g. 36-bit DMA addresses. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20241018194753.775074-1-bvanassche@acm.org Reviewed-by: Avri Altman <Avri.Altman@wdc.com> Reviewed-by: Peter Wang <peter.wang@mediatek.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-10-25Merge tag 'fuse-fixes-6.12-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse fixes from Miklos Szeredi: - Fix cached size after passthrough writes This fix needed a trivial change in the backing-file API, which resulted in some non-fuse files being touched. - Revert a commit meant as a cleanup but which triggered a WARNING - Remove a stray debug line left-over * tag 'fuse-fixes-6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: remove stray debug line Revert "fuse: move initialization of fuse_file to fuse_writepages() instead of in callback" fuse: update inode size after extending passthrough write fs: pass offset and result to backing_file end_write() callback