summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2025-03-03Merge tag 'v6.14-rc5' of ↵Bartosz Golaszewski
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux into gpio/for-next Linux 6.14-rc5
2025-03-02hwmon: (k10temp) add support for cyan skillfishMikhail Paulyshka
Add support for Cyan Skillfish (AMD Family 17h Model 47h), which appear to be Zen 2 based APU. The patch was tested with an AMD BC-250 board. Signed-off-by: Mikhail Paulyshka <me@mixaill.net> Link: https://lore.kernel.org/r/20250302155009.49951-1-me@mixaill.net Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2025-03-02cred: Fix RCU warnings in override/revert_credsHerbert Xu
Fix RCU warnings in override_creds and revert_creds by turning the RCU pointer into a normal pointer using rcu_replace_pointer. These warnings were previously private to the cred code, but due to the move into the header file they are now polluting unrelated subsystems. Fixes: 49dffdfde462 ("cred: Add a light version of override/revert_creds()") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Link: https://lore.kernel.org/r/Z8QGQGW0IaSklKG7@gondor.apana.org.au Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-01Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: "Ryan's been hard at work finding and fixing mm bugs in the arm64 code, so here's a small crop of fixes for -rc5. The main changes are to fix our zapping of non-present PTEs for hugetlb entries created using the contiguous bit in the page-table rather than a block entry at the level above. Prior to these fixes, we were pulling the contiguous bit back out of the PTE in order to determine the size of the hugetlb page but this is clearly bogus if the thing isn't present and consequently both the clearing of the PTE(s) and the TLB invalidation were unreliable. Although the problem was found by code inspection, we really don't want this sitting around waiting to trigger and the changes are CC'd to stable accordingly. Note that the diffstat looks a lot worse than it really is; huge_ptep_get_and_clear() now takes a size argument from the core code and so all the arch implementations of that have been updated in a pretty mechanical fashion. - Fix a sporadic boot failure due to incorrect randomization of the linear map on systems that support it - Fix the zapping (both clearing the entries *and* invalidating the TLB) of hugetlb PTEs constructed using the contiguous bit" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: hugetlb: Fix flush_hugetlb_tlb_range() invalidation level arm64: hugetlb: Fix huge_ptep_get_and_clear() for non-present ptes mm: hugetlb: Add huge page size param to huge_ptep_get_and_clear() arm64/mm: Fix Boot panic on Ampere Altra
2025-03-01io.h: drop unused headersRaag Jadav
Drop unused headers and type declaration from io.h. Signed-off-by: Raag Jadav <raag.jadav@intel.com> Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-03-01asm-generic/io.h: rework split ioread64/iowrite64 helpersArnd Bergmann
There are two incompatible sets of definitions of these eight functions: On 64-bit architectures setting CONFIG_HAS_IOPORT, they turn into either pair of 32-bit PIO (inl/outl) accesses or a single 64-bit MMIO (readq/writeq). On other 64-bit architectures, they are always split into 32-bit accesses. Depending on which header gets included in a driver, there are additionally definitions for ioread64()/iowrite64() that are expected to produce a 64-bit register MMIO access on all 64-bit architectures. To separate the conflicting definitions, make the version in include/linux/io-64-nonatomic-*.h visible on all architectures but pick the one from lib/iomap.c on architectures that set CONFIG_GENERIC_IOMAP in place of the default fallback. Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-03-01Merge branch 'perf/urgent' into perf/core, to pick up dependent patches and ↵Ingo Molnar
fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-03-01Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "ARM: - Fix TCR_EL2 configuration to not use the ASID in TTBR1_EL2 and not mess-up T1SZ/PS by using the HCR_EL2.E2H==0 layout. - Bring back the VMID allocation to the vcpu_load phase, ensuring that we only setup VTTBR_EL2 once on VHE. This cures an ugly race that would lead to running with an unallocated VMID. RISC-V: - Fix hart status check in SBI HSM extension - Fix hart suspend_type usage in SBI HSM extension - Fix error returned by SBI IPI and TIME extensions for unsupported function IDs - Fix suspend_type usage in SBI SUSP extension - Remove unnecessary vcpu kick after injecting interrupt via IMSIC guest file x86: - Fix an nVMX bug where KVM fails to detect that, after nested VM-Exit, L1 has a pending IRQ (or NMI). - To avoid freeing the PIC while vCPUs are still around, which would cause a NULL pointer access with the previous patch, destroy vCPUs before any VM-level destruction. - Handle failures to create vhost_tasks" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: kvm: retry nx_huge_page_recovery_thread creation vhost: return task creation error instead of NULL KVM: nVMX: Process events on nested VM-Exit if injectable IRQ or NMI is pending KVM: x86: Free vCPUs before freeing VM state riscv: KVM: Remove unnecessary vcpu kick KVM: arm64: Ensure a VMID is allocated before programming VTTBR_EL2 KVM: arm64: Fix tcr_el2 initialisation in hVHE mode riscv: KVM: Fix SBI sleep_type use riscv: KVM: Fix SBI TIME error generation riscv: KVM: Fix SBI IPI error generation riscv: KVM: Fix hart suspend_type use riscv: KVM: Fix hart suspend status check
2025-03-01fs: place f_ref to 3rd cache line in struct file to resolve false sharingPan Deng
When running syscall pread in a high core count system, f_ref contends with the reading of f_mode, f_op, f_mapping, f_inode, f_flags in the same cache line. This change places f_ref to the 3rd cache line where fields are not updated as frequently as the 1st cache line, and the contention is grealy reduced according to tests. In addition, the size of file object is kept in 3 cache lines. This change has been tested with rocksdb benchmark readwhilewriting case in 1 socket 64 physical core 128 logical core baremetal machine, with build config CONFIG_RANDSTRUCT_NONE=y Command: ./db_bench --benchmarks="readwhilewriting" --threads $cnt --duration 60 The throughput(ops/s) is improved up to ~21%. ===== thread baseline compare 16 100% +1.3% 32 100% +2.2% 64 100% +7.2% 128 100% +20.9% It was also tested with UnixBench: syscall, fsbuffer, fstime, fsdisk cases that has been used for file struct layout tuning, no regression was observed. Signed-off-by: Pan Deng <pan.deng@intel.com> Link: https://lore.kernel.org/r/20250228020059.3023375-1-pan.deng@intel.com Tested-by: Lipeng Zhu <lipeng.zhu@intel.com> Reviewed-by: Tianyou Li <tianyou.li@intel.com> Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-01kvm: retry nx_huge_page_recovery_thread creationKeith Busch
A VMM may send a non-fatal signal to its threads, including vCPU tasks, at any time, and thus may signal vCPU tasks during KVM_RUN. If a vCPU task receives the signal while its trying to spawn the huge page recovery vhost task, then KVM_RUN will fail due to copy_process() returning -ERESTARTNOINTR. Rework call_once() to mark the call complete if and only if the called function succeeds, and plumb the function's true error code back to the call_once() invoker. This provides userspace with the correct, non-fatal error code so that the VMM doesn't terminate the VM on -ENOMEM, and allows subsequent KVM_RUN a succeed by virtue of retrying creation of the NX huge page task. Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> [implemented the kvm user side] Signed-off-by: Keith Busch <kbusch@kernel.org> Message-ID: <20250227230631.303431-3-kbusch@meta.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-03-01perf: arm_pmu: Move PMUv3-specific dataMark Rutland
A few fields in struct arm_pmu are only used with PMUv3, and soon we will need to add more for BRBE. Group the fields together so that we have a logical place to add more data in future. At the same time, remove the comment for reg_pmmir as it doesn't convey anything useful. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Tested-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250218-arm-brbe-v19-v20-7-4e9922fc2e8e@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2025-02-28io_uring/ublk: report error when unregister operation failsCaleb Sander Mateos
Indicate to userspace applications if a UBLK_IO_UNREGISTER_IO_BUF command specifies an invalid buffer index by returning an error code. Return -EINVAL if no buffer is registered with the given index, and -EBUSY if the registered buffer is not a kernel bvec. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Link: https://lore.kernel.org/r/20250228231432.642417-1-csander@purestorage.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-02-28io_uring: convert cmd_to_io_kiocb() macro to functionCaleb Sander Mateos
The cmd_to_io_kiocb() macro applies a pointer cast to its input without parenthesizing it. Currently all inputs are variable names, so this has the intended effect. But since casts have relatively high precedence, the macro would apply the cast to the wrong value if the input was a pointer addition, for example. Turn the macro into a static inline function to ensure the pointer cast is applied to the full input value. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Link: https://lore.kernel.org/r/20250228230305.630885-1-csander@purestorage.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-02-28io_uring/uring_cmd: specify io_uring_cmd_import_fixed() pointer typeCaleb Sander Mateos
io_uring_cmd_import_fixed() takes a struct io_uring_cmd *, but the type of the ioucmd parameter is void *. Make the pointer type explicit so the compiler can type check it. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Link: https://lore.kernel.org/r/20250228221514.604350-1-csander@purestorage.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-02-28Merge tag 'objtool-urgent-2025-02-28' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool fixes from Ingo Molnar: "Fix an objtool false positive, and objtool related build warnings that happens on PIE-enabled architectures such as LoongArch" * tag 'objtool-urgent-2025-02-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool: Add bch2_trans_unlocked_or_in_restart_error() to bcachefs noreturns objtool: Fix C jump table annotations for Clang vmlinux.lds: Ensure that const vars with relocations are mapped R/O
2025-02-28Merge tag 'locking-urgent-2025-02-28' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fix from Ingo Molnar: "Fix an rcuref_put() slowpath race" * tag 'locking-urgent-2025-02-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: rcuref: Plug slowpath race in rcuref_put()
2025-02-28pwm: Check for CONFIG_PWM using IS_REACHABLE() in main headerUwe Kleine-König
Preparing CONFIG_PWM becoming tristate the right magic to check for the availability of the pwm functions is using IS_REACHABLE() and not IS_ENABLED(). The latter gives the wrong result for built-in code with CONFIG_PWM=m. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Link: https://lore.kernel.org/r/20250217102504.687916-2-u.kleine-koenig@baylibre.com Signed-off-by: Uwe Kleine-König <ukleinek@kernel.org>
2025-02-28driver core: Introduce device_{add,remove}_of_node()Herve Codina
An of_node can be set to a device using device_set_node(), which does not prevent any of_node and/or fwnode overwrites. When adding an of_node on an already present device, the following operations need to be done: - Attach the of_node only if no of_node is already attached - Attach the of_node as a fwnode if no fwnode were already attached This is the purpose of device_add_of_node(). device_remove_of_node() reverts the operations done by device_add_of_node(). Link: https://lore.kernel.org/r/20250224141356.36325-2-herve.codina@bootlin.com Signed-off-by: Herve Codina <herve.codina@bootlin.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-28mm: security: Check early if HARDENED_USERCOPY is enabledMel Gorman
HARDENED_USERCOPY is checked within a function so even if disabled, the function overhead still exists. Move the static check inline. This is at best a micro-optimisation and any difference in performance was within noise but it is relatively consistent with the init_on_* implementations. Suggested-by: Kees Cook <kees@kernel.org> Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Link: https://lore.kernel.org/r/20250123221115.19722-4-mgorman@techsingularity.net Signed-off-by: Kees Cook <kees@kernel.org>
2025-02-28uaccess: Introduce ucopysize.hKees Cook
The object size sanity checking macros that uaccess.h and uio.h use have been living in thread_info.h for historical reasons. Needing to use jump labels for these checks, however, introduces a header include loop under certain conditions. The dependencies for the object checking macros are very limited, but they are used by separate header files, so introduce a new header that can be used directly by uaccess.h and uio.h. As a result, this also means thread_info.h (which is rather large) and be removed from those headers. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202502281153.TG2XK5SI-lkp@intel.com/ Signed-off-by: Kees Cook <kees@kernel.org>
2025-02-28net: stmmac: provide set_clk_tx_rate() hookRussell King (Oracle)
Several stmmac sub-drivers which support RGMII follow the same pattern. They calculate the transmit clock rate, and then call clk_set_rate(). Analysis of several implementation documents suggests that the platform is responsible for providing the transmit clock to the DWMAC core's clk_tx_i. The expected rates are: 10Mbps 100Mbps 1Gbps MII 2.5MHz 25MHz RMII 2.5MHz 25MHz GMII 125MHz RGMI 2.5MHz 25MHz 125MHz It seems some platforms require this clock to be manually configured, but there are outputs from the MAC core that indicate the speed, so a platform may use these to automatically configure the clock. Thus, we can't just provide one solution to configure this clock rate. Moreover, the clock may need to be derived from one of several sources depending on the interface mode. Provide a platform hook that is passed the transmit clock, interface mode and speed. Reviewed-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/E1tna0F-0052sS-Lr@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-28Merge tag 'block-6.14-20250228' of git://git.kernel.dk/linuxLinus Torvalds
Pull block fixes from Jens Axboe: - Fix plugging for native zone writes - Fix segment limit settings for != 4K page size archs - Fix for slab names overflowing * tag 'block-6.14-20250228' of git://git.kernel.dk/linux: block: fix 'kmem_cache of name 'bio-108' already exists' block: Remove zone write plugs when handling native zone append writes block: make segment size limit workable for > 4K PAGE_SIZE
2025-02-28Convert regulator drivers to useMark Brown
Merge series from Raag Jadav <raag.jadav@intel.com>: This series converts regulator drivers to use the newly introduced[1] devm_kmemdup_array() helper. [1] https://lore.kernel.org/r/20250212062513.2254767-1-raag.jadav@intel.com
2025-02-28Convert sound drivers to use devm_kmemdup_array()Mark Brown
Merge series from Raag Jadav <raag.jadav@intel.com>: This series converts sound drivers to use the newly introduced[1] devm_kmemdup_array() helper. [1] https://lore.kernel.org/r/20250212062513.2254767-1-raag.jadav@intel.com
2025-02-28Merge tag 'for-joerg' of ↵Joerg Roedel
git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd into core iommu shared branch with iommufd The three dependent series on a shared branch: - Change the iommufd fault handle into an always present hwpt handle in the domain - Give iommufd its own SW_MSI implementation along with some IRQ layer rework - Improvements to the handle attach API
2025-02-28io_uring: cache nodes and mapped buffersKeith Busch
Frequent alloc/free cycles on these is pretty costly. Use an io cache to more efficiently reuse these buffers. Signed-off-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20250227223916.143006-7-kbusch@meta.com [axboe: fix imu leak] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-02-28io_uring: add support for kernel registered bvecsKeith Busch
Provide an interface for the kernel to leverage the existing pre-registered buffers that io_uring provides. User space can reference these later to achieve zero-copy IO. User space must register an empty fixed buffer table with io_uring in order for the kernel to make use of it. Signed-off-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20250227223916.143006-5-kbusch@meta.com Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-02-28pmdomain: Merge tag regulator-devm-of-get into nextUlf Hansson
Merge the tag regulator-devm-of-get from git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator into next. This introduces devm_of_regulator_get without the _optional suffix, since that is more sensible for the Rockchip usecase. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2025-02-28pmdomain: Merge tag 'v6.14-rc4' from Linus into nextUlf Hansson
Linux 6.14-rc4
2025-02-28fs: Remove page_mkwrite_check_truncate()Matthew Wilcox (Oracle)
All callers of this function have now been converted to use folio_mkwrite_check_truncate(). Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org> Link: https://lore.kernel.org/r/20250221204421.3590340-1-willy@infradead.org Tested-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-27net: qed: make 'qed_ll2_ops_pass' as __maybe_unusedArnd Bergmann
gcc warns about unused const variables even in header files when building with W=1: In file included from include/linux/qed/qed_rdma_if.h:14, from drivers/net/ethernet/qlogic/qed/qed_rdma.h:16, from drivers/net/ethernet/qlogic/qed/qed_cxt.c:23: include/linux/qed/qed_ll2_if.h:270:33: error: 'qed_ll2_ops_pass' defined but not used [-Werror=unused-const-variable=] 270 | static const struct qed_ll2_ops qed_ll2_ops_pass = { This one is intentional, so mark it as __maybe_unused to it can be included from a file that doesn't use this variable. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Simon Horman <horms@kernel.org> # build-tested Link: https://patch.msgid.link/20250225200926.4057723-1-arnd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-28Merge branch 'ib-amlogic-a4' into develLinus Walleij
Merge immutable branch into devel for next. Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2025-02-28pinctrl: pinconf-generic: Add API for pinmux propertity in DTS fileXianwei Zhao
When describing pin mux func through pinmux propertity, a standard API is added for support. The pinmux contains pin identification and mux values, which can include multiple pins. And groups configuration use other word. DTS such as: func-name { group_alias: group-name{ pinmux= <pin_id << 8 | mux_value)>, <pin_id << 8 | mux_value)>; bias-pull-up; drive-strength-microamp = <4000>; }; }; Signed-off-by: Xianwei Zhao <xianwei.zhao@amlogic.com> Link: https://lore.kernel.org/20250212-amlogic-pinctrl-v5-2-282bc2516804@amlogic.com Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2025-02-27Change inode_operations.mkdir to return struct dentry *NeilBrown
Some filesystems, such as NFS, cifs, ceph, and fuse, do not have complete control of sequencing on the actual filesystem (e.g. on a different server) and may find that the inode created for a mkdir request already exists in the icache and dcache by the time the mkdir request returns. For example, if the filesystem is mounted twice the directory could be visible on the other mount before it is on the original mount, and a pair of name_to_handle_at(), open_by_handle_at() calls could instantiate the directory inode with an IS_ROOT() dentry before the first mkdir returns. This means that the dentry passed to ->mkdir() may not be the one that is associated with the inode after the ->mkdir() completes. Some callers need to interact with the inode after the ->mkdir completes and they currently need to perform a lookup in the (rare) case that the dentry is no longer hashed. This lookup-after-mkdir requires that the directory remains locked to avoid races. Planned future patches to lock the dentry rather than the directory will mean that this lookup cannot be performed atomically with the mkdir. To remove this barrier, this patch changes ->mkdir to return the resulting dentry if it is different from the one passed in. Possible returns are: NULL - the directory was created and no other dentry was used ERR_PTR() - an error occurred non-NULL - this other dentry was spliced in This patch only changes file-systems to return "ERR_PTR(err)" instead of "err" or equivalent transformations. Subsequent patches will make further changes to some file-systems to return a correct dentry. Not all filesystems reliably result in a positive hashed dentry: - NFS, cifs, hostfs will sometimes need to perform a lookup of the name to get inode information. Races could result in this returning something different. Note that this lookup is non-atomic which is what we are trying to avoid. Placing the lookup in filesystem code means it only happens when the filesystem has no other option. - kernfs and tracefs leave the dentry negative and the ->revalidate operation ensures that lookup will be called to correctly populate the dentry. This could be fixed but I don't think it is important to any of the users of vfs_mkdir() which look at the dentry. The recommendation to use d_drop();d_splice_alias() is ugly but fits with current practice. A planned future patch will change this. Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: NeilBrown <neilb@suse.de> Link: https://lore.kernel.org/r/20250227013949.536172-2-neilb@suse.de Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-27mm: Provide address mask in struct follow_pfnmap_argsAlex Williamson
follow_pfnmap_start() walks the page table for a given address and fills out the struct follow_pfnmap_args in pfnmap_args_setup(). The address mask of the page table level is already provided to this latter function for calculating the pfn. This address mask can also be useful for the caller to determine the extent of the contiguous mapping. For example, vfio-pci now supports huge_fault for pfnmaps and is able to insert pud and pmd mappings. When we DMA map these pfnmaps, ex. PCI MMIO BARs, we iterate follow_pfnmap_start() to get each pfn to test for a contiguous pfn range. Providing the mapping address mask allows us to skip the extent of the mapping level. Assuming a 1GB pud level and 4KB page size, iterations are reduced by a factor of 256K. In wall clock time, mapping a 32GB PCI BAR is reduced from ~1s to <1ms. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: David Hildenbrand <david@redhat.com> Cc: linux-mm@kvack.org Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Mitchell Augustin <mitchell.augustin@canonical.com> Tested-by: Mitchell Augustin <mitchell.augustin@canonical.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20250218222209.1382449-6-alex.williamson@redhat.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2025-02-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.14-rc5). Conflicts: drivers/net/ethernet/cadence/macb_main.c fa52f15c745c ("net: cadence: macb: Synchronize stats calculations") 75696dd0fd72 ("net: cadence: macb: Convert to get_stats64") https://lore.kernel.org/20250224125848.68ee63e5@canb.auug.org.au Adjacent changes: drivers/net/ethernet/intel/ice/ice_sriov.c 79990cf5e7ad ("ice: Fix deinitializing VF in error path") a203163274a4 ("ice: simplify VF MSI-X managing") net/ipv4/tcp.c 18912c520674 ("tcp: devmem: don't write truncated dmabuf CMSGs to userspace") 297d389e9e5b ("net: prefix devmem specific helpers") net/mptcp/subflow.c 8668860b0ad3 ("mptcp: reset when MPTCP opts are dropped after join") c3349a22c200 ("mptcp: consolidate subflow cleanup") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-27mm: hugetlb: Add huge page size param to huge_ptep_get_and_clear()Ryan Roberts
In order to fix a bug, arm64 needs to be told the size of the huge page for which the huge_pte is being cleared in huge_ptep_get_and_clear(). Provide for this by adding an `unsigned long sz` parameter to the function. This follows the same pattern as huge_pte_clear() and set_huge_pte_at(). This commit makes the required interface modifications to the core mm as well as all arches that implement this function (arm64, loongarch, mips, parisc, powerpc, riscv, s390, sparc). The actual arm64 bug will be fixed in a separate commit. Cc: stable@vger.kernel.org Fixes: 66b3923a1a0f ("arm64: hugetlb: add support for PTE contiguous bit") Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> # riscv Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> # s390 Link: https://lore.kernel.org/r/20250226120656.2400136-2-ryan.roberts@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-02-27bpf: Use try_alloc_pages() to allocate pages for bpf needs.Alexei Starovoitov
Use try_alloc_pages() and free_pages_nolock() for BPF needs when context doesn't allow using normal alloc_pages. This is a prerequisite for further work. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/r/20250222024427.30294-7-alexei.starovoitov@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-02-27mm, bpf: Introduce free_pages_nolock()Alexei Starovoitov
Introduce free_pages_nolock() that can free pages without taking locks. It relies on trylock and can be called from any context. Since spin_trylock() cannot be used in PREEMPT_RT from hard IRQ or NMI it uses lockless link list to stash the pages which will be freed by subsequent free_pages() from good context. Do not use llist unconditionally. BPF maps continuously allocate/free, so we cannot unconditionally delay the freeing to llist. When the memory becomes free make it available to the kernel and BPF users right away if possible, and fallback to llist as the last resort. Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/r/20250222024427.30294-4-alexei.starovoitov@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-02-27Merge tag 'net-6.14-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bluetooth. We didn't get netfilter or wireless PRs this week, so next week's PR is probably going to be bigger. A healthy dose of fixes for bugs introduced in the current release nonetheless. Current release - regressions: - Bluetooth: always allow SCO packets for user channel - af_unix: fix memory leak in unix_dgram_sendmsg() - rxrpc: - remove redundant peer->mtu_lock causing lockdep splats - fix spinlock flavor issues with the peer record hash - eth: iavf: fix circular lock dependency with netdev_lock - net: use rtnl_net_dev_lock() in register_netdevice_notifier_dev_net() RDMA driver register notifier after the device Current release - new code bugs: - ethtool: fix ioctl confusing drivers about desired HDS user config - eth: ixgbe: fix media cage present detection for E610 device Previous releases - regressions: - loopback: avoid sending IP packets without an Ethernet header - mptcp: reset connection when MPTCP opts are dropped after join Previous releases - always broken: - net: better track kernel sockets lifetime - ipv6: fix dst ref loop on input in seg6 and rpl lw tunnels - phy: qca807x: use right value from DTS for DAC_DSP_BIAS_CURRENT - eth: enetc: number of error handling fixes - dsa: rtl8366rb: reshuffle the code to fix config / build issue with LED support" * tag 'net-6.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (53 commits) net: ti: icss-iep: Reject perout generation request idpf: fix checksums set in idpf_rx_rsc() selftests: drv-net: Check if combined-count exists net: ipv6: fix dst ref loop on input in rpl lwt net: ipv6: fix dst ref loop on input in seg6 lwt usbnet: gl620a: fix endpoint checking in genelink_bind() net/mlx5: IRQ, Fix null string in debug print net/mlx5: Restore missing trace event when enabling vport QoS net/mlx5: Fix vport QoS cleanup on error net: mvpp2: cls: Fixed Non IP flow, with vlan tag flow defination. af_unix: Fix memory leak in unix_dgram_sendmsg() net: Handle napi_schedule() calls from non-interrupt net: Clear old fragment checksum value in napi_reuse_skb gve: unlink old napi when stopping a queue using queue API net: Use rtnl_net_dev_lock() in register_netdevice_notifier_dev_net(). tcp: Defer ts_recent changes until req is owned net: enetc: fix the off-by-one issue in enetc_map_tx_tso_buffs() net: enetc: remove the mm_lock from the ENETC v4 driver net: enetc: add missing enetc4_link_deinit() net: enetc: update UDP checksum when updating originTimestamp field ...
2025-02-27mm, bpf: Introduce try_alloc_pages() for opportunistic page allocationAlexei Starovoitov
Tracing BPF programs execute from tracepoints and kprobes where running context is unknown, but they need to request additional memory. The prior workarounds were using pre-allocated memory and BPF specific freelists to satisfy such allocation requests. Instead, introduce gfpflags_allow_spinning() condition that signals to the allocator that running context is unknown. Then rely on percpu free list of pages to allocate a page. try_alloc_pages() -> get_page_from_freelist() -> rmqueue() -> rmqueue_pcplist() will spin_trylock to grab the page from percpu free list. If it fails (due to re-entrancy or list being empty) then rmqueue_bulk()/rmqueue_buddy() will attempt to spin_trylock zone->lock and grab the page from there. spin_trylock() is not safe in PREEMPT_RT when in NMI or in hard IRQ. Bailout early in such case. The support for gfpflags_allow_spinning() mode for free_page and memcg comes in the next patches. This is a first step towards supporting BPF requirements in SLUB and getting rid of bpf_mem_alloc. That goal was discussed at LSFMM: https://lwn.net/Articles/974138/ Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/r/20250222024427.30294-3-alexei.starovoitov@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-02-27locking/local_lock: Introduce localtry_lock_tSebastian Andrzej Siewior
In !PREEMPT_RT local_lock_irqsave() disables interrupts to protect critical section, but it doesn't prevent NMI, so the fully reentrant code cannot use local_lock_irqsave() for exclusive access. Introduce localtry_lock_t and localtry_lock_irqsave() that disables interrupts and sets acquired=1, so localtry_lock_irqsave() from NMI attempting to acquire the same lock will return false. In PREEMPT_RT local_lock_irqsave() maps to preemptible spin_lock(). Map localtry_lock_irqsave() to preemptible spin_trylock(). When in hard IRQ or NMI return false right away, since spin_trylock() is not safe due to explicit locking in the underneath rt_spin_trylock() implementation. Removing this explicit locking and attempting only "trylock" is undesired due to PI implications. Note there is no need to use local_inc for acquired variable, since it's a percpu variable with strict nesting scopes. Acked-by: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Link: https://lore.kernel.org/r/20250222024427.30294-2-alexei.starovoitov@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-02-27io_uring/nvme: pass issue_flags to io_uring_cmd_import_fixed()Pavel Begunkov
io_uring_cmd_import_fixed() will need to know the io_uring execution state in following commits, for now just pass issue_flags into it without actually using. Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250224213116.3509093-5-kbusch@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-02-27drivers: base: component: add function to query the bound statusHeiko Stuebner
The component helpers already expose the bound status in debugfs, but at times it might be necessary to also check that state in the kernel and act differently depending on the result. For example the shutdown handler of a drm-driver might need to stop a whole output pipeline if the drm device is up and running, but may run into problems if that drm-device has never been set up before, for example because the binding deferred. So add a little helper that returns the bound status for a componet device. Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Heiko Stuebner <heiko@sntech.de> Link: https://patchwork.freedesktop.org/patch/msgid/20250220234141.2788785-2-heiko@sntech.de
2025-02-27regcache: Add support for sorting defaults arraysCharles Keepax
The defaults array in regcache must be sorted into ascending register address order, because binary search is used to locate values in the array. Add a helper to sort the register defaults array which can be useful for systems that dynamically create a defaults array based on external information. Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com> Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.dev> Link: https://patch.msgid.link/20250217140159.2288784-2-ckeepax@opensource.cirrus.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-02-27net: skbuff: introduce napi_skb_cache_get_bulk()Alexander Lobakin
Add a function to get an array of skbs from the NAPI percpu cache. It's supposed to be a drop-in replacement for kmem_cache_alloc_bulk(skbuff_head_cache, GFP_ATOMIC) and xdp_alloc_skb_bulk(GFP_ATOMIC). The difference (apart from the requirement to call it only from the BH) is that it tries to use as many NAPI cache entries for skbs as possible, and allocate new ones only if needed. The logic is as follows: * there is enough skbs in the cache: decache them and return to the caller; * not enough: try refilling the cache first. If there is now enough skbs, return; * still not enough: try allocating skbs directly to the output array with %GFP_ZERO, maybe we'll be able to get some. If there's now enough, return; * still not enough: return as many as we were able to obtain. Most of times, if called from the NAPI polling loop, the first one will be true, sometimes (rarely) the second one. The third and the fourth -- only under heavy memory pressure. It can save significant amounts of CPU cycles if there are GRO cycles and/or Tx completion cycles (anything that descends to napi_skb_cache_put()) happening on this CPU. Tested-by: Daniel Xu <dxu@dxuuu.xyz> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-27net: gro: decouple GRO from the NAPI layerAlexander Lobakin
In fact, these two are not tied closely to each other. The only requirements to GRO are to use it in the BH context and have some sane limits on the packet batches, e.g. NAPI has a limit of its budget (64/8/etc.). Move purely GRO fields into a new structure, &gro_node. Embed it into &napi_struct and adjust all the references. gro_node::cached_napi_id is effectively the same as napi_struct::napi_id, but to be used on GRO hotpath to mark skbs. napi_struct::napi_id is now a fully control path field. Three Ethernet drivers use napi_gro_flush() not really meant to be exported, so move it to <net/gro.h> and add that include there. napi_gro_receive() is used in more than 100 drivers, keep it in <linux/netdevice.h>. This does not make GRO ready to use outside of the NAPI context yet. Tested-by: Daniel Xu <dxu@dxuuu.xyz> Acked-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-27iomap: make buffered writes work with RWF_DONTCACHEJens Axboe
Add iomap buffered write support for RWF_DONTCACHE. If RWF_DONTCACHE is set for a write, mark the folios being written as uncached. Then writeback completion will drop the pages. The write_iter handler simply kicks off writeback for the pages, and writeback completion will take care of the rest. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/20250204184047.356762-2-axboe@kernel.dk Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-27Merge patch series "prep patches for my mkdir series"Christian Brauner
NeilBrown <neilb@suse.de> says: These two patches are cleanup are dependencies for my mkdir changes and subsequence directory locking changes. * patches from https://lore.kernel.org/r/20250226062135.2043651-1-neilb@suse.de: (2 commits) nfsd: drop fh_update() from S_IFDIR branch of nfsd_create_locked() nfs/vfs: discard d_exact_alias() Link: https://lore.kernel.org/r/20250226062135.2043651-1-neilb@suse.de Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-27dmaengine: ti: k3-udma-glue: Drop skip_fdq argument from ↵Roger Quadros
k3_udma_glue_reset_rx_chn The user of k3_udma_glue_reset_rx_chn() e.g. ti_am65_cpsw_nuss can run on multiple platforms having different DMA architectures. On some platforms there can be one FDQ for all flows in the RX channel while for others there is a separate FDQ for each flow in the RX channel. So far we have been relying on the skip_fdq argument of k3_udma_glue_reset_rx_chn(). Instead of relying on the user to provide this information, infer it based on DMA architecture during k3_udma_glue_request_rx_chn() and save it in an internal flag 'single_fdq'. Use that flag at k3_udma_glue_reset_rx_chn() to deicide if the FDQ needs to be cleared for every flow or just for flow 0. Fixes the below issue on ti_am65_cpsw_nuss driver on AM62-SK. > ip link set eth1 down > ip link set eth0 down > ethtool -L eth0 rx 8 > ip link set eth0 up > modprobe -r ti_am65_cpsw_nuss [ 103.045726] ------------[ cut here ]------------ [ 103.050505] k3_knav_desc_pool size 512000 != avail 64000 [ 103.050703] WARNING: CPU: 1 PID: 450 at drivers/net/ethernet/ti/k3-cppi-desc-pool.c:33 k3_cppi_desc_pool_destroy+0xa0/0xa8 [k3_cppi_desc_pool] [ 103.068810] Modules linked in: ti_am65_cpsw_nuss(-) k3_cppi_desc_pool snd_soc_hdmi_codec crct10dif_ce snd_soc_simple_card snd_soc_simple_card_utils display_connector rtc_ti_k3 k3_j72xx_bandgap tidss drm_client_lib snd_soc_davinci_mcas p drm_dma_helper tps6598x phylink snd_soc_ti_udma rti_wdt drm_display_helper snd_soc_tlv320aic3x_i2c typec at24 phy_gmii_sel snd_soc_ti_edma snd_soc_tlv320aic3x sii902x snd_soc_ti_sdma sa2ul omap_mailbox drm_kms_helper authenc cfg80211 r fkill fuse drm drm_panel_orientation_quirks backlight ip_tables x_tables ipv6 [last unloaded: k3_cppi_desc_pool] [ 103.119950] CPU: 1 UID: 0 PID: 450 Comm: modprobe Not tainted 6.13.0-rc7-00001-g9c5e3435fa66 #1011 [ 103.119968] Hardware name: Texas Instruments AM625 SK (DT) [ 103.119974] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 103.119983] pc : k3_cppi_desc_pool_destroy+0xa0/0xa8 [k3_cppi_desc_pool] [ 103.148007] lr : k3_cppi_desc_pool_destroy+0xa0/0xa8 [k3_cppi_desc_pool] [ 103.154709] sp : ffff8000826ebbc0 [ 103.158015] x29: ffff8000826ebbc0 x28: ffff0000090b6300 x27: 0000000000000000 [ 103.165145] x26: 0000000000000000 x25: 0000000000000000 x24: ffff0000019df6b0 [ 103.172271] x23: ffff0000019df6b8 x22: ffff0000019df410 x21: ffff8000826ebc88 [ 103.179397] x20: 000000000007d000 x19: ffff00000a3b3000 x18: 0000000000000000 [ 103.186522] x17: 0000000000000000 x16: 0000000000000000 x15: 000001e8c35e1cde [ 103.193647] x14: 0000000000000396 x13: 000000000000035c x12: 0000000000000000 [ 103.200772] x11: 000000000000003a x10: 00000000000009c0 x9 : ffff8000826eba20 [ 103.207897] x8 : ffff0000090b6d20 x7 : ffff00007728c180 x6 : ffff00007728c100 [ 103.215022] x5 : 0000000000000001 x4 : ffff000000508a50 x3 : ffff7ffff6146000 [ 103.222147] x2 : 0000000000000000 x1 : e300b4173ee6b200 x0 : 0000000000000000 [ 103.229274] Call trace: [ 103.231714] k3_cppi_desc_pool_destroy+0xa0/0xa8 [k3_cppi_desc_pool] (P) [ 103.238408] am65_cpsw_nuss_free_rx_chns+0x28/0x4c [ti_am65_cpsw_nuss] [ 103.244942] devm_action_release+0x14/0x20 [ 103.249040] release_nodes+0x3c/0x68 [ 103.252610] devres_release_all+0x8c/0xdc [ 103.256614] device_unbind_cleanup+0x18/0x60 [ 103.260876] device_release_driver_internal+0xf8/0x178 [ 103.266004] driver_detach+0x50/0x9c [ 103.269571] bus_remove_driver+0x6c/0xbc [ 103.273485] driver_unregister+0x30/0x60 [ 103.277401] platform_driver_unregister+0x14/0x20 [ 103.282096] am65_cpsw_nuss_driver_exit+0x18/0xff4 [ti_am65_cpsw_nuss] [ 103.288620] __arm64_sys_delete_module+0x17c/0x25c [ 103.293404] invoke_syscall+0x44/0x100 [ 103.297149] el0_svc_common.constprop.0+0xc0/0xe0 [ 103.301845] do_el0_svc+0x1c/0x28 [ 103.305155] el0_svc+0x28/0x98 [ 103.308207] el0t_64_sync_handler+0xc8/0xcc [ 103.312384] el0t_64_sync+0x198/0x19c [ 103.316040] ---[ end trace 0000000000000000 ]--- Signed-off-by: Roger Quadros <rogerq@kernel.org> Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Peter Ujfalusi <peter.ujfalusi@gmail.com> Link: https://lore.kernel.org/r/20250224-k3-udma-glue-single-fdq-v2-1-cbe7621f2507@kernel.org Signed-off-by: Vinod Koul <vkoul@kernel.org>