summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2025-03-01kvm: retry nx_huge_page_recovery_thread creationKeith Busch
A VMM may send a non-fatal signal to its threads, including vCPU tasks, at any time, and thus may signal vCPU tasks during KVM_RUN. If a vCPU task receives the signal while its trying to spawn the huge page recovery vhost task, then KVM_RUN will fail due to copy_process() returning -ERESTARTNOINTR. Rework call_once() to mark the call complete if and only if the called function succeeds, and plumb the function's true error code back to the call_once() invoker. This provides userspace with the correct, non-fatal error code so that the VMM doesn't terminate the VM on -ENOMEM, and allows subsequent KVM_RUN a succeed by virtue of retrying creation of the NX huge page task. Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> [implemented the kvm user side] Signed-off-by: Keith Busch <kbusch@kernel.org> Message-ID: <20250227230631.303431-3-kbusch@meta.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-03-01perf: arm_pmu: Move PMUv3-specific dataMark Rutland
A few fields in struct arm_pmu are only used with PMUv3, and soon we will need to add more for BRBE. Group the fields together so that we have a logical place to add more data in future. At the same time, remove the comment for reg_pmmir as it doesn't convey anything useful. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Tested-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250218-arm-brbe-v19-v20-7-4e9922fc2e8e@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2025-02-28io_uring/ublk: report error when unregister operation failsCaleb Sander Mateos
Indicate to userspace applications if a UBLK_IO_UNREGISTER_IO_BUF command specifies an invalid buffer index by returning an error code. Return -EINVAL if no buffer is registered with the given index, and -EBUSY if the registered buffer is not a kernel bvec. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Link: https://lore.kernel.org/r/20250228231432.642417-1-csander@purestorage.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-02-28io_uring: convert cmd_to_io_kiocb() macro to functionCaleb Sander Mateos
The cmd_to_io_kiocb() macro applies a pointer cast to its input without parenthesizing it. Currently all inputs are variable names, so this has the intended effect. But since casts have relatively high precedence, the macro would apply the cast to the wrong value if the input was a pointer addition, for example. Turn the macro into a static inline function to ensure the pointer cast is applied to the full input value. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Link: https://lore.kernel.org/r/20250228230305.630885-1-csander@purestorage.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-02-28io_uring/uring_cmd: specify io_uring_cmd_import_fixed() pointer typeCaleb Sander Mateos
io_uring_cmd_import_fixed() takes a struct io_uring_cmd *, but the type of the ioucmd parameter is void *. Make the pointer type explicit so the compiler can type check it. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Link: https://lore.kernel.org/r/20250228221514.604350-1-csander@purestorage.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-02-28Merge tag 'objtool-urgent-2025-02-28' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool fixes from Ingo Molnar: "Fix an objtool false positive, and objtool related build warnings that happens on PIE-enabled architectures such as LoongArch" * tag 'objtool-urgent-2025-02-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool: Add bch2_trans_unlocked_or_in_restart_error() to bcachefs noreturns objtool: Fix C jump table annotations for Clang vmlinux.lds: Ensure that const vars with relocations are mapped R/O
2025-02-28Merge tag 'locking-urgent-2025-02-28' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fix from Ingo Molnar: "Fix an rcuref_put() slowpath race" * tag 'locking-urgent-2025-02-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: rcuref: Plug slowpath race in rcuref_put()
2025-02-28pwm: Check for CONFIG_PWM using IS_REACHABLE() in main headerUwe Kleine-König
Preparing CONFIG_PWM becoming tristate the right magic to check for the availability of the pwm functions is using IS_REACHABLE() and not IS_ENABLED(). The latter gives the wrong result for built-in code with CONFIG_PWM=m. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Link: https://lore.kernel.org/r/20250217102504.687916-2-u.kleine-koenig@baylibre.com Signed-off-by: Uwe Kleine-König <ukleinek@kernel.org>
2025-02-28Merge branch 'v6.15-shared/clkids' into v6.15-clk/nextHeiko Stuebner
2025-02-28dt-bindings: clock: Add RK3562 cruKever Yang
Document the device tree bindings of the rockchip rk3562 SoC clock and reset unit. Signed-off-by: Kever Yang <kever.yang@rock-chips.com> Reviewed-by: "Rob Herring (Arm)" <robh@kernel.org> Link: https://lore.kernel.org/r/20250227105916.2340856-2-kever.yang@rock-chips.com Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-02-28driver core: Introduce device_{add,remove}_of_node()Herve Codina
An of_node can be set to a device using device_set_node(), which does not prevent any of_node and/or fwnode overwrites. When adding an of_node on an already present device, the following operations need to be done: - Attach the of_node only if no of_node is already attached - Attach the of_node as a fwnode if no fwnode were already attached This is the purpose of device_add_of_node(). device_remove_of_node() reverts the operations done by device_add_of_node(). Link: https://lore.kernel.org/r/20250224141356.36325-2-herve.codina@bootlin.com Signed-off-by: Herve Codina <herve.codina@bootlin.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-28nilfs2: Mark on-disk strings as nonstringKees Cook
In preparation for memtostr*() checking that its source is marked as nonstring, annotate the device strings accordingly using the new UAPI alias for the "nonstring" attribute. Signed-off-by: Kees Cook <kees@kernel.org>
2025-02-28uapi: stddef.h: Introduce __kernel_nonstringKees Cook
In order to annotate byte arrays in UAPI that are not C strings (i.e. they may not be NUL terminated), the "nonstring" attribute is needed. However, we can't expose this to userspace as it is compiler version specific. Signed-off-by: Kees Cook <kees@kernel.org>
2025-02-28mm: security: Check early if HARDENED_USERCOPY is enabledMel Gorman
HARDENED_USERCOPY is checked within a function so even if disabled, the function overhead still exists. Move the static check inline. This is at best a micro-optimisation and any difference in performance was within noise but it is relatively consistent with the init_on_* implementations. Suggested-by: Kees Cook <kees@kernel.org> Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Link: https://lore.kernel.org/r/20250123221115.19722-4-mgorman@techsingularity.net Signed-off-by: Kees Cook <kees@kernel.org>
2025-02-28uaccess: Introduce ucopysize.hKees Cook
The object size sanity checking macros that uaccess.h and uio.h use have been living in thread_info.h for historical reasons. Needing to use jump labels for these checks, however, introduces a header include loop under certain conditions. The dependencies for the object checking macros are very limited, but they are used by separate header files, so introduce a new header that can be used directly by uaccess.h and uio.h. As a result, this also means thread_info.h (which is rather large) and be removed from those headers. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202502281153.TG2XK5SI-lkp@intel.com/ Signed-off-by: Kees Cook <kees@kernel.org>
2025-02-28net: stmmac: provide set_clk_tx_rate() hookRussell King (Oracle)
Several stmmac sub-drivers which support RGMII follow the same pattern. They calculate the transmit clock rate, and then call clk_set_rate(). Analysis of several implementation documents suggests that the platform is responsible for providing the transmit clock to the DWMAC core's clk_tx_i. The expected rates are: 10Mbps 100Mbps 1Gbps MII 2.5MHz 25MHz RMII 2.5MHz 25MHz GMII 125MHz RGMI 2.5MHz 25MHz 125MHz It seems some platforms require this clock to be manually configured, but there are outputs from the MAC core that indicate the speed, so a platform may use these to automatically configure the clock. Thus, we can't just provide one solution to configure this clock rate. Moreover, the clock may need to be derived from one of several sources depending on the interface mode. Provide a platform hook that is passed the transmit clock, interface mode and speed. Reviewed-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/E1tna0F-0052sS-Lr@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-28Merge tag 'block-6.14-20250228' of git://git.kernel.dk/linuxLinus Torvalds
Pull block fixes from Jens Axboe: - Fix plugging for native zone writes - Fix segment limit settings for != 4K page size archs - Fix for slab names overflowing * tag 'block-6.14-20250228' of git://git.kernel.dk/linux: block: fix 'kmem_cache of name 'bio-108' already exists' block: Remove zone write plugs when handling native zone append writes block: make segment size limit workable for > 4K PAGE_SIZE
2025-02-28uapi: Revert "bitops: avoid integer overflow in GENMASK(_ULL)"I Hsin Cheng
This patch reverts 'commit c32ee3d9abd2("bitops: avoid integer overflow in GENMASK(_ULL)")'. The code generation can be shrink by over 1KB by reverting this commit. Originally the commit claimed that clang would emit warnings using the implementation at that time. The patch was applied and tested against numerous compilers, including gcc-13, gcc-12, gcc-11 cross-compiler, clang-17, clang-18 and clang-19. Various warning levels were set (-W=0, -W=1, -W=2) and CONFIG_WERROR disabled to complete the compilation. The results show that no compilation errors or warnings were generated due to the patch. The results of code size reduction are summarized in the following table. The code size changes for clang are all zero across different versions, so they're not listed in the table. For NR_CPUS=64 on x86_64. ---------------------------------------------- | | gcc-13 | gcc-12 | gcc-11 | ---------------------------------------------- | old | 22438085 | 22453915 | 22302033 | ---------------------------------------------- | new | 22436816 | 22452913 | 22300826 | ---------------------------------------------- | new - old | -1269 | -1002 | -1207 | ---------------------------------------------- For NR_CPUS=1024 on x86_64. ---------------------------------------------- | | gcc-13 | gcc-12 | gcc-11 | ---------------------------------------------- | old | 22493682 | 22509812 | 22357661 | ---------------------------------------------- | new | 22493230 | 22509487 | 22357250 | ---------------------------------------------- | new - old | -452 | -325 | -411 | ---------------------------------------------- For arm64 architecture, gcc cross-compiler was used and QEMU was utilized to execute a VM for a CPU-heavy workload to ensure no side effects and that functionalities remained correct. The test even demonstrated a positive result in terms of code size reduction: * Before: 31660668 * After: 31658724 * Difference (After - Before): -1944 An analysis of multiple functions compiled with gcc-13 on x86_64 was performed. In summary, the patch elimates one negation in almost every use case. However, negative effects may occur in some cases, such as the generation of additional "mov" instruction or increased register usage. The use of "~_UL(0) << (l)" may even result in the allocations of "%r*" registers instead of "%e*" registers (which are 32-bit registers) because the compiler cannot assume that the higher bits are zero. Yury: We limit GENMASK() usage with the const_true(l > h) condition, and most of users just call it with constant parameters. For those, the actual implementation of the macro doesn't matter, and since it triggered clang warnings back then, it was reasonable to workaround the warnings on the kernel side. Now that some find_bit() functions call GENMASK() with runtime parameters (although the const_true() condition holds), this ended up hurting the generated code, as I Hsin discovered. This is especially bad because it hurts small_const_nbits() optimization, where people are most concerned about generated code quality. So, revert it to the original version for good. Signed-off-by: I Hsin Cheng <richard120310@gmail.com> Signed-off-by: Yury Norov <yury.norov@gmail.com>
2025-02-28Convert regulator drivers to useMark Brown
Merge series from Raag Jadav <raag.jadav@intel.com>: This series converts regulator drivers to use the newly introduced[1] devm_kmemdup_array() helper. [1] https://lore.kernel.org/r/20250212062513.2254767-1-raag.jadav@intel.com
2025-02-28Convert sound drivers to use devm_kmemdup_array()Mark Brown
Merge series from Raag Jadav <raag.jadav@intel.com>: This series converts sound drivers to use the newly introduced[1] devm_kmemdup_array() helper. [1] https://lore.kernel.org/r/20250212062513.2254767-1-raag.jadav@intel.com
2025-02-28Merge tag 'for-joerg' of ↵Joerg Roedel
git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd into core iommu shared branch with iommufd The three dependent series on a shared branch: - Change the iommufd fault handle into an always present hwpt handle in the domain - Give iommufd its own SW_MSI implementation along with some IRQ layer rework - Improvements to the handle attach API
2025-02-28Merge drm/drm-next into drm-xe-nextLucas De Marchi
Sync to fix conlicts between drm-xe-next and drm-intel-next. Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-02-28io_uring: cache nodes and mapped buffersKeith Busch
Frequent alloc/free cycles on these is pretty costly. Use an io cache to more efficiently reuse these buffers. Signed-off-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20250227223916.143006-7-kbusch@meta.com [axboe: fix imu leak] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-02-28ublk: zc register/unregister bvecKeith Busch
Provide new operations for the user to request mapping an active request to an io uring instance's buf_table. The user has to provide the index it wants to install the buffer. A reference count is taken on the request to ensure it can't be completed while it is active in a ring's buf_table. Signed-off-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20250227223916.143006-6-kbusch@meta.com Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-02-28io_uring: add support for kernel registered bvecsKeith Busch
Provide an interface for the kernel to leverage the existing pre-registered buffers that io_uring provides. User space can reference these later to achieve zero-copy IO. User space must register an empty fixed buffer table with io_uring in order for the kernel to make use of it. Signed-off-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20250227223916.143006-5-kbusch@meta.com Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-02-28wifi: cfg80211: reorg sinfo structure elements for meshSarika Sharma
Currently, as multi-link operation(MLO) is not supported for mesh, reorganize the sinfo structure for mesh-specific fields and embed mesh related NL attributes together in organized view. This will allow for the simplified reorganization of sinfo structure for link level in a subsequent patch to add support for MLO station statistics. No functionality changes added. Pahole summary before the reorg of sinfo structure: - size: 256, cachelines: 4, members: 50 - sum members: 239, holes: 4, sum holes: 17 - paddings: 2, sum paddings: 2 - forced alignments: 1, forced holes: 1, sum forced holes: 1 Pahole summary after the reorg of sinfo structure: - size: 248, cachelines: 4, members: 50 - sum members: 239, holes: 4, sum holes: 9 - paddings: 2, sum paddings: 2 - forced alignments: 1, last cacheline: 56 bytes Signed-off-by: Sarika Sharma <quic_sarishar@quicinc.com> Link: https://patch.msgid.link/20250213171632.1646538-2-quic_sarishar@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-02-28pmdomain: Merge tag regulator-devm-of-get into nextUlf Hansson
Merge the tag regulator-devm-of-get from git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator into next. This introduces devm_of_regulator_get without the _optional suffix, since that is more sensible for the Rockchip usecase. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2025-02-28pmdomain: Merge tag 'v6.14-rc4' from Linus into nextUlf Hansson
Linux 6.14-rc4
2025-02-28fs: Remove page_mkwrite_check_truncate()Matthew Wilcox (Oracle)
All callers of this function have now been converted to use folio_mkwrite_check_truncate(). Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org> Link: https://lore.kernel.org/r/20250221204421.3590340-1-willy@infradead.org Tested-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-28Merge tag 'drm-misc-next-2025-02-27' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/misc/kernel into drm-next drm-misc-next for v6.15: Cross-subsystem Changes: bus: - mhi: Avoid access to uninitialized field Core Changes: - Fix docmentation dp: - Add helpers for LTTPR transparent mode sched: - Improve job peek/pop operations - Optimize layout of struct drm_sched_job Driver Changes: arc: - Convert to devm_platform_ioremap_resource() aspeed: - Convert to devm_platform_ioremap_resource() bridge: - ti-sn65dsi86: Support CONFIG_PWM tristate i915: - dp: Use helpers for LTTPR transparent mode mediatek: - Convert to devm_platform_ioremap_resource() msm: - dp: Use helpers for LTTPR transparent mode nouveau: - dp: Use helpers for LTTPR transparent mode panel: - raydium-rm67200: Add driver for Raydium RM67200 - simple: Add support for BOE AV123Z7M-N17, BOE AV123Z7M-N17 - sony-td4353-jdi: Use MIPI-DSI multi-func interface - summit: Add driver for Apple Summit display panel - visionox-rm692e5: Add driver for Visionox RM692E5 repaper: - Fix integer overflows stm: - Convert to devm_platform_ioremap_resource() vc4: - Convert to devm_platform_ioremap_resource() Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20250227094041.GA114623@linux.fritz.box
2025-02-27geneve: Allow users to specify source port rangeDaniel Borkmann
Recently, in case of Cilium, we run into users on Azure who require to use tunneling for east/west traffic due to hitting IPAM API limits for Kubernetes Pods if they would have gone with publicly routable IPs for Pods. In case of tunneling, Cilium supports the option of vxlan or geneve. In order to RSS spread flows among remote CPUs both derive a source port hash via udp_flow_src_port() which takes the inner packet's skb->hash into account. For clusters with many nodes, this can then hit a new limitation [0]: Today, the Azure networking stack supports 1M total flows (500k inbound and 500k outbound) for a VM. [...] Once this limit is hit, other connections are dropped. [...] Each flow is distinguished by a 5-tuple (protocol, local IP address, remote IP address, local port, and remote port) information. [...] For vxlan and geneve, this can create a massive amount of UDP flows which then run into the limits if stale flows are not evicted fast enough. One option to mitigate this for vxlan is to narrow the source port range via IFLA_VXLAN_PORT_RANGE while still being able to benefit from RSS. However, geneve currently does not have this option and it spreads traffic across the full source port range of [1, USHRT_MAX]. To overcome this limitation also for geneve, add an equivalent IFLA_GENEVE_PORT_RANGE setting for users. Note that struct geneve_config before/after still remains at 2 cachelines on x86-64. The low/high members of struct ifla_geneve_port_range (which is uapi exposed) are of type __be16. While they would be perfectly fine to be of __u16 type, the consensus was that it would be good to be consistent with the existing struct ifla_vxlan_port_range from a uapi consumer PoV. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://learn.microsoft.com/en-us/azure/virtual-network/virtual-machine-network-throughput [0] Link: https://patch.msgid.link/20250226182030.89440-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-27net: qed: make 'qed_ll2_ops_pass' as __maybe_unusedArnd Bergmann
gcc warns about unused const variables even in header files when building with W=1: In file included from include/linux/qed/qed_rdma_if.h:14, from drivers/net/ethernet/qlogic/qed/qed_rdma.h:16, from drivers/net/ethernet/qlogic/qed/qed_cxt.c:23: include/linux/qed/qed_ll2_if.h:270:33: error: 'qed_ll2_ops_pass' defined but not used [-Werror=unused-const-variable=] 270 | static const struct qed_ll2_ops qed_ll2_ops_pass = { This one is intentional, so mark it as __maybe_unused to it can be included from a file that doesn't use this variable. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Simon Horman <horms@kernel.org> # build-tested Link: https://patch.msgid.link/20250225200926.4057723-1-arnd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-28Merge branch 'ib-amlogic-a4' into develLinus Walleij
Merge immutable branch into devel for next. Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2025-02-28pinctrl: pinconf-generic: Add API for pinmux propertity in DTS fileXianwei Zhao
When describing pin mux func through pinmux propertity, a standard API is added for support. The pinmux contains pin identification and mux values, which can include multiple pins. And groups configuration use other word. DTS such as: func-name { group_alias: group-name{ pinmux= <pin_id << 8 | mux_value)>, <pin_id << 8 | mux_value)>; bias-pull-up; drive-strength-microamp = <4000>; }; }; Signed-off-by: Xianwei Zhao <xianwei.zhao@amlogic.com> Link: https://lore.kernel.org/20250212-amlogic-pinctrl-v5-2-282bc2516804@amlogic.com Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2025-02-28dt-bindings: pinctrl: Add support for Amlogic A4 SoCXianwei Zhao
Add the dt-bindings for Amlogic pin controller, and add a new dt-binding header file which document the GPIO bank names of Amlogic A4 SoC. Signed-off-by: Xianwei Zhao <xianwei.zhao@amlogic.com> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Link: https://lore.kernel.org/20250212-amlogic-pinctrl-v5-1-282bc2516804@amlogic.com Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2025-02-27Merge branch 'ib-sophgo' into develLinus Walleij
Pull the immutable branch into devel for next. Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2025-02-27dt-bindings: pinctrl: Add pinctrl for Sophgo SG2042 series SoCInochi Amaoto
SG2042 introduces a simple pinctrl device for all configurable pins. For the SG2042 pinctl register file, each register (32 bits) is responsible for two pins, each occupying the upper 16 bits and lower 16 bits of the register. It supports setting pull up/down, drive strength and input schmitt trigger. Add support for SG2042 pinctrl device. Signed-off-by: Inochi Amaoto <inochiama@gmail.com> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Link: https://lore.kernel.org/20250211051801.470800-6-inochiama@gmail.com Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2025-02-27drm/amdkfd: clamp queue size to minimumDavid Yat Sin
If queue size is less than minimum, clamp it to minimum to prevent underflow when writing queue mqd. Signed-off-by: David Yat Sin <David.YatSin@amd.com> Reviewed-by: Jay Cornwall <jay.cornwall@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27Change inode_operations.mkdir to return struct dentry *NeilBrown
Some filesystems, such as NFS, cifs, ceph, and fuse, do not have complete control of sequencing on the actual filesystem (e.g. on a different server) and may find that the inode created for a mkdir request already exists in the icache and dcache by the time the mkdir request returns. For example, if the filesystem is mounted twice the directory could be visible on the other mount before it is on the original mount, and a pair of name_to_handle_at(), open_by_handle_at() calls could instantiate the directory inode with an IS_ROOT() dentry before the first mkdir returns. This means that the dentry passed to ->mkdir() may not be the one that is associated with the inode after the ->mkdir() completes. Some callers need to interact with the inode after the ->mkdir completes and they currently need to perform a lookup in the (rare) case that the dentry is no longer hashed. This lookup-after-mkdir requires that the directory remains locked to avoid races. Planned future patches to lock the dentry rather than the directory will mean that this lookup cannot be performed atomically with the mkdir. To remove this barrier, this patch changes ->mkdir to return the resulting dentry if it is different from the one passed in. Possible returns are: NULL - the directory was created and no other dentry was used ERR_PTR() - an error occurred non-NULL - this other dentry was spliced in This patch only changes file-systems to return "ERR_PTR(err)" instead of "err" or equivalent transformations. Subsequent patches will make further changes to some file-systems to return a correct dentry. Not all filesystems reliably result in a positive hashed dentry: - NFS, cifs, hostfs will sometimes need to perform a lookup of the name to get inode information. Races could result in this returning something different. Note that this lookup is non-atomic which is what we are trying to avoid. Placing the lookup in filesystem code means it only happens when the filesystem has no other option. - kernfs and tracefs leave the dentry negative and the ->revalidate operation ensures that lookup will be called to correctly populate the dentry. This could be fixed but I don't think it is important to any of the users of vfs_mkdir() which look at the dentry. The recommendation to use d_drop();d_splice_alias() is ugly but fits with current practice. A planned future patch will change this. Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: NeilBrown <neilb@suse.de> Link: https://lore.kernel.org/r/20250227013949.536172-2-neilb@suse.de Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-27mm: Provide address mask in struct follow_pfnmap_argsAlex Williamson
follow_pfnmap_start() walks the page table for a given address and fills out the struct follow_pfnmap_args in pfnmap_args_setup(). The address mask of the page table level is already provided to this latter function for calculating the pfn. This address mask can also be useful for the caller to determine the extent of the contiguous mapping. For example, vfio-pci now supports huge_fault for pfnmaps and is able to insert pud and pmd mappings. When we DMA map these pfnmaps, ex. PCI MMIO BARs, we iterate follow_pfnmap_start() to get each pfn to test for a contiguous pfn range. Providing the mapping address mask allows us to skip the extent of the mapping level. Assuming a 1GB pud level and 4KB page size, iterations are reduced by a factor of 256K. In wall clock time, mapping a 32GB PCI BAR is reduced from ~1s to <1ms. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: David Hildenbrand <david@redhat.com> Cc: linux-mm@kvack.org Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Mitchell Augustin <mitchell.augustin@canonical.com> Tested-by: Mitchell Augustin <mitchell.augustin@canonical.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20250218222209.1382449-6-alex.williamson@redhat.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2025-02-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.14-rc5). Conflicts: drivers/net/ethernet/cadence/macb_main.c fa52f15c745c ("net: cadence: macb: Synchronize stats calculations") 75696dd0fd72 ("net: cadence: macb: Convert to get_stats64") https://lore.kernel.org/20250224125848.68ee63e5@canb.auug.org.au Adjacent changes: drivers/net/ethernet/intel/ice/ice_sriov.c 79990cf5e7ad ("ice: Fix deinitializing VF in error path") a203163274a4 ("ice: simplify VF MSI-X managing") net/ipv4/tcp.c 18912c520674 ("tcp: devmem: don't write truncated dmabuf CMSGs to userspace") 297d389e9e5b ("net: prefix devmem specific helpers") net/mptcp/subflow.c 8668860b0ad3 ("mptcp: reset when MPTCP opts are dropped after join") c3349a22c200 ("mptcp: consolidate subflow cleanup") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-27mm: hugetlb: Add huge page size param to huge_ptep_get_and_clear()Ryan Roberts
In order to fix a bug, arm64 needs to be told the size of the huge page for which the huge_pte is being cleared in huge_ptep_get_and_clear(). Provide for this by adding an `unsigned long sz` parameter to the function. This follows the same pattern as huge_pte_clear() and set_huge_pte_at(). This commit makes the required interface modifications to the core mm as well as all arches that implement this function (arm64, loongarch, mips, parisc, powerpc, riscv, s390, sparc). The actual arm64 bug will be fixed in a separate commit. Cc: stable@vger.kernel.org Fixes: 66b3923a1a0f ("arm64: hugetlb: add support for PTE contiguous bit") Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> # riscv Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> # s390 Link: https://lore.kernel.org/r/20250226120656.2400136-2-ryan.roberts@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-02-27bpf: Use try_alloc_pages() to allocate pages for bpf needs.Alexei Starovoitov
Use try_alloc_pages() and free_pages_nolock() for BPF needs when context doesn't allow using normal alloc_pages. This is a prerequisite for further work. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/r/20250222024427.30294-7-alexei.starovoitov@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-02-27mm, bpf: Introduce free_pages_nolock()Alexei Starovoitov
Introduce free_pages_nolock() that can free pages without taking locks. It relies on trylock and can be called from any context. Since spin_trylock() cannot be used in PREEMPT_RT from hard IRQ or NMI it uses lockless link list to stash the pages which will be freed by subsequent free_pages() from good context. Do not use llist unconditionally. BPF maps continuously allocate/free, so we cannot unconditionally delay the freeing to llist. When the memory becomes free make it available to the kernel and BPF users right away if possible, and fallback to llist as the last resort. Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/r/20250222024427.30294-4-alexei.starovoitov@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-02-27Merge tag 'net-6.14-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bluetooth. We didn't get netfilter or wireless PRs this week, so next week's PR is probably going to be bigger. A healthy dose of fixes for bugs introduced in the current release nonetheless. Current release - regressions: - Bluetooth: always allow SCO packets for user channel - af_unix: fix memory leak in unix_dgram_sendmsg() - rxrpc: - remove redundant peer->mtu_lock causing lockdep splats - fix spinlock flavor issues with the peer record hash - eth: iavf: fix circular lock dependency with netdev_lock - net: use rtnl_net_dev_lock() in register_netdevice_notifier_dev_net() RDMA driver register notifier after the device Current release - new code bugs: - ethtool: fix ioctl confusing drivers about desired HDS user config - eth: ixgbe: fix media cage present detection for E610 device Previous releases - regressions: - loopback: avoid sending IP packets without an Ethernet header - mptcp: reset connection when MPTCP opts are dropped after join Previous releases - always broken: - net: better track kernel sockets lifetime - ipv6: fix dst ref loop on input in seg6 and rpl lw tunnels - phy: qca807x: use right value from DTS for DAC_DSP_BIAS_CURRENT - eth: enetc: number of error handling fixes - dsa: rtl8366rb: reshuffle the code to fix config / build issue with LED support" * tag 'net-6.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (53 commits) net: ti: icss-iep: Reject perout generation request idpf: fix checksums set in idpf_rx_rsc() selftests: drv-net: Check if combined-count exists net: ipv6: fix dst ref loop on input in rpl lwt net: ipv6: fix dst ref loop on input in seg6 lwt usbnet: gl620a: fix endpoint checking in genelink_bind() net/mlx5: IRQ, Fix null string in debug print net/mlx5: Restore missing trace event when enabling vport QoS net/mlx5: Fix vport QoS cleanup on error net: mvpp2: cls: Fixed Non IP flow, with vlan tag flow defination. af_unix: Fix memory leak in unix_dgram_sendmsg() net: Handle napi_schedule() calls from non-interrupt net: Clear old fragment checksum value in napi_reuse_skb gve: unlink old napi when stopping a queue using queue API net: Use rtnl_net_dev_lock() in register_netdevice_notifier_dev_net(). tcp: Defer ts_recent changes until req is owned net: enetc: fix the off-by-one issue in enetc_map_tx_tso_buffs() net: enetc: remove the mm_lock from the ENETC v4 driver net: enetc: add missing enetc4_link_deinit() net: enetc: update UDP checksum when updating originTimestamp field ...
2025-02-27mm, bpf: Introduce try_alloc_pages() for opportunistic page allocationAlexei Starovoitov
Tracing BPF programs execute from tracepoints and kprobes where running context is unknown, but they need to request additional memory. The prior workarounds were using pre-allocated memory and BPF specific freelists to satisfy such allocation requests. Instead, introduce gfpflags_allow_spinning() condition that signals to the allocator that running context is unknown. Then rely on percpu free list of pages to allocate a page. try_alloc_pages() -> get_page_from_freelist() -> rmqueue() -> rmqueue_pcplist() will spin_trylock to grab the page from percpu free list. If it fails (due to re-entrancy or list being empty) then rmqueue_bulk()/rmqueue_buddy() will attempt to spin_trylock zone->lock and grab the page from there. spin_trylock() is not safe in PREEMPT_RT when in NMI or in hard IRQ. Bailout early in such case. The support for gfpflags_allow_spinning() mode for free_page and memcg comes in the next patches. This is a first step towards supporting BPF requirements in SLUB and getting rid of bpf_mem_alloc. That goal was discussed at LSFMM: https://lwn.net/Articles/974138/ Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/r/20250222024427.30294-3-alexei.starovoitov@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-02-27locking/local_lock: Introduce localtry_lock_tSebastian Andrzej Siewior
In !PREEMPT_RT local_lock_irqsave() disables interrupts to protect critical section, but it doesn't prevent NMI, so the fully reentrant code cannot use local_lock_irqsave() for exclusive access. Introduce localtry_lock_t and localtry_lock_irqsave() that disables interrupts and sets acquired=1, so localtry_lock_irqsave() from NMI attempting to acquire the same lock will return false. In PREEMPT_RT local_lock_irqsave() maps to preemptible spin_lock(). Map localtry_lock_irqsave() to preemptible spin_trylock(). When in hard IRQ or NMI return false right away, since spin_trylock() is not safe due to explicit locking in the underneath rt_spin_trylock() implementation. Removing this explicit locking and attempting only "trylock" is undesired due to PI implications. Note there is no need to use local_inc for acquired variable, since it's a percpu variable with strict nesting scopes. Acked-by: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Link: https://lore.kernel.org/r/20250222024427.30294-2-alexei.starovoitov@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-02-27Merge tag 'sound-6.14-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "A collection of fixes. The only slightly large change is for ASoC Cirrus codec, but that's still in a normal range. All the rest are small device-specific fixes and should be fairly safe to take" * tag 'sound-6.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: hda/realtek: Fix microphone regression on ASUS N705UD ALSA: hda/realtek: Fix wrong mic setup for ASUS VivoBook 15 ASoC: cs35l56: Prevent races when soft-resetting using SPI control firmware: cs_dsp: Remove async regmap writes ASoC: Intel: sof_sdw: warn both sdw and pch dmic are used ASoC: SOF: Intel: don't check number of sdw links when set dmic_fixup ASoC: dapm-graph: set fill colour of turned on nodes ASoC: fsl: Rename stream name of SAI DAI driver ASoC: es8328: fix route from DAC to output ALSA: usb-audio: Re-add sample rate quirk for Pioneer DJM-900NXS2 ASoC: tas2764: Set the SDOUT polarity correctly ASoC: tas2764: Fix power control mask ALSA: usb-audio: Avoid dropping MIDI events at closing multiple ports ASoC: tas2770: Fix volume scale
2025-02-27Merge branch 'io_uring-6.14' into for-6.15/io_uringJens Axboe
Merge mainline fixes into 6.15 branch, as upcoming patches depend on fixes that went into the 6.14 mainline branch. * io_uring-6.14: io_uring/net: save msg_control for compat io_uring/rw: clean up mshot forced sync mode io_uring/rw: move ki_complete init into prep io_uring/rw: don't directly use ki_complete io_uring/rw: forbid multishot async reads io_uring/rsrc: remove unused constants io_uring: fix spelling error in uapi io_uring.h io_uring: prevent opcode speculation io-wq: backoff when retrying worker creation
2025-02-27io_uring/nvme: pass issue_flags to io_uring_cmd_import_fixed()Pavel Begunkov
io_uring_cmd_import_fixed() will need to know the io_uring execution state in following commits, for now just pass issue_flags into it without actually using. Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250224213116.3509093-5-kbusch@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>