summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2025-02-28pwm: Check for CONFIG_PWM using IS_REACHABLE() in main headerUwe Kleine-König
Preparing CONFIG_PWM becoming tristate the right magic to check for the availability of the pwm functions is using IS_REACHABLE() and not IS_ENABLED(). The latter gives the wrong result for built-in code with CONFIG_PWM=m. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Link: https://lore.kernel.org/r/20250217102504.687916-2-u.kleine-koenig@baylibre.com Signed-off-by: Uwe Kleine-König <ukleinek@kernel.org>
2025-02-28driver core: Introduce device_{add,remove}_of_node()Herve Codina
An of_node can be set to a device using device_set_node(), which does not prevent any of_node and/or fwnode overwrites. When adding an of_node on an already present device, the following operations need to be done: - Attach the of_node only if no of_node is already attached - Attach the of_node as a fwnode if no fwnode were already attached This is the purpose of device_add_of_node(). device_remove_of_node() reverts the operations done by device_add_of_node(). Link: https://lore.kernel.org/r/20250224141356.36325-2-herve.codina@bootlin.com Signed-off-by: Herve Codina <herve.codina@bootlin.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-28mm: security: Check early if HARDENED_USERCOPY is enabledMel Gorman
HARDENED_USERCOPY is checked within a function so even if disabled, the function overhead still exists. Move the static check inline. This is at best a micro-optimisation and any difference in performance was within noise but it is relatively consistent with the init_on_* implementations. Suggested-by: Kees Cook <kees@kernel.org> Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Link: https://lore.kernel.org/r/20250123221115.19722-4-mgorman@techsingularity.net Signed-off-by: Kees Cook <kees@kernel.org>
2025-02-28uaccess: Introduce ucopysize.hKees Cook
The object size sanity checking macros that uaccess.h and uio.h use have been living in thread_info.h for historical reasons. Needing to use jump labels for these checks, however, introduces a header include loop under certain conditions. The dependencies for the object checking macros are very limited, but they are used by separate header files, so introduce a new header that can be used directly by uaccess.h and uio.h. As a result, this also means thread_info.h (which is rather large) and be removed from those headers. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202502281153.TG2XK5SI-lkp@intel.com/ Signed-off-by: Kees Cook <kees@kernel.org>
2025-02-28net: stmmac: provide set_clk_tx_rate() hookRussell King (Oracle)
Several stmmac sub-drivers which support RGMII follow the same pattern. They calculate the transmit clock rate, and then call clk_set_rate(). Analysis of several implementation documents suggests that the platform is responsible for providing the transmit clock to the DWMAC core's clk_tx_i. The expected rates are: 10Mbps 100Mbps 1Gbps MII 2.5MHz 25MHz RMII 2.5MHz 25MHz GMII 125MHz RGMI 2.5MHz 25MHz 125MHz It seems some platforms require this clock to be manually configured, but there are outputs from the MAC core that indicate the speed, so a platform may use these to automatically configure the clock. Thus, we can't just provide one solution to configure this clock rate. Moreover, the clock may need to be derived from one of several sources depending on the interface mode. Provide a platform hook that is passed the transmit clock, interface mode and speed. Reviewed-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/E1tna0F-0052sS-Lr@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-28Merge tag 'block-6.14-20250228' of git://git.kernel.dk/linuxLinus Torvalds
Pull block fixes from Jens Axboe: - Fix plugging for native zone writes - Fix segment limit settings for != 4K page size archs - Fix for slab names overflowing * tag 'block-6.14-20250228' of git://git.kernel.dk/linux: block: fix 'kmem_cache of name 'bio-108' already exists' block: Remove zone write plugs when handling native zone append writes block: make segment size limit workable for > 4K PAGE_SIZE
2025-02-28Convert regulator drivers to useMark Brown
Merge series from Raag Jadav <raag.jadav@intel.com>: This series converts regulator drivers to use the newly introduced[1] devm_kmemdup_array() helper. [1] https://lore.kernel.org/r/20250212062513.2254767-1-raag.jadav@intel.com
2025-02-28Convert sound drivers to use devm_kmemdup_array()Mark Brown
Merge series from Raag Jadav <raag.jadav@intel.com>: This series converts sound drivers to use the newly introduced[1] devm_kmemdup_array() helper. [1] https://lore.kernel.org/r/20250212062513.2254767-1-raag.jadav@intel.com
2025-02-28Merge tag 'for-joerg' of ↵Joerg Roedel
git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd into core iommu shared branch with iommufd The three dependent series on a shared branch: - Change the iommufd fault handle into an always present hwpt handle in the domain - Give iommufd its own SW_MSI implementation along with some IRQ layer rework - Improvements to the handle attach API
2025-02-28io_uring: cache nodes and mapped buffersKeith Busch
Frequent alloc/free cycles on these is pretty costly. Use an io cache to more efficiently reuse these buffers. Signed-off-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20250227223916.143006-7-kbusch@meta.com [axboe: fix imu leak] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-02-28io_uring: add support for kernel registered bvecsKeith Busch
Provide an interface for the kernel to leverage the existing pre-registered buffers that io_uring provides. User space can reference these later to achieve zero-copy IO. User space must register an empty fixed buffer table with io_uring in order for the kernel to make use of it. Signed-off-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20250227223916.143006-5-kbusch@meta.com Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-02-28pmdomain: Merge tag regulator-devm-of-get into nextUlf Hansson
Merge the tag regulator-devm-of-get from git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator into next. This introduces devm_of_regulator_get without the _optional suffix, since that is more sensible for the Rockchip usecase. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2025-02-28pmdomain: Merge tag 'v6.14-rc4' from Linus into nextUlf Hansson
Linux 6.14-rc4
2025-02-28fs: Remove page_mkwrite_check_truncate()Matthew Wilcox (Oracle)
All callers of this function have now been converted to use folio_mkwrite_check_truncate(). Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org> Link: https://lore.kernel.org/r/20250221204421.3590340-1-willy@infradead.org Tested-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-27net: qed: make 'qed_ll2_ops_pass' as __maybe_unusedArnd Bergmann
gcc warns about unused const variables even in header files when building with W=1: In file included from include/linux/qed/qed_rdma_if.h:14, from drivers/net/ethernet/qlogic/qed/qed_rdma.h:16, from drivers/net/ethernet/qlogic/qed/qed_cxt.c:23: include/linux/qed/qed_ll2_if.h:270:33: error: 'qed_ll2_ops_pass' defined but not used [-Werror=unused-const-variable=] 270 | static const struct qed_ll2_ops qed_ll2_ops_pass = { This one is intentional, so mark it as __maybe_unused to it can be included from a file that doesn't use this variable. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Simon Horman <horms@kernel.org> # build-tested Link: https://patch.msgid.link/20250225200926.4057723-1-arnd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-28Merge branch 'ib-amlogic-a4' into develLinus Walleij
Merge immutable branch into devel for next. Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2025-02-28pinctrl: pinconf-generic: Add API for pinmux propertity in DTS fileXianwei Zhao
When describing pin mux func through pinmux propertity, a standard API is added for support. The pinmux contains pin identification and mux values, which can include multiple pins. And groups configuration use other word. DTS such as: func-name { group_alias: group-name{ pinmux= <pin_id << 8 | mux_value)>, <pin_id << 8 | mux_value)>; bias-pull-up; drive-strength-microamp = <4000>; }; }; Signed-off-by: Xianwei Zhao <xianwei.zhao@amlogic.com> Link: https://lore.kernel.org/20250212-amlogic-pinctrl-v5-2-282bc2516804@amlogic.com Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2025-02-27Change inode_operations.mkdir to return struct dentry *NeilBrown
Some filesystems, such as NFS, cifs, ceph, and fuse, do not have complete control of sequencing on the actual filesystem (e.g. on a different server) and may find that the inode created for a mkdir request already exists in the icache and dcache by the time the mkdir request returns. For example, if the filesystem is mounted twice the directory could be visible on the other mount before it is on the original mount, and a pair of name_to_handle_at(), open_by_handle_at() calls could instantiate the directory inode with an IS_ROOT() dentry before the first mkdir returns. This means that the dentry passed to ->mkdir() may not be the one that is associated with the inode after the ->mkdir() completes. Some callers need to interact with the inode after the ->mkdir completes and they currently need to perform a lookup in the (rare) case that the dentry is no longer hashed. This lookup-after-mkdir requires that the directory remains locked to avoid races. Planned future patches to lock the dentry rather than the directory will mean that this lookup cannot be performed atomically with the mkdir. To remove this barrier, this patch changes ->mkdir to return the resulting dentry if it is different from the one passed in. Possible returns are: NULL - the directory was created and no other dentry was used ERR_PTR() - an error occurred non-NULL - this other dentry was spliced in This patch only changes file-systems to return "ERR_PTR(err)" instead of "err" or equivalent transformations. Subsequent patches will make further changes to some file-systems to return a correct dentry. Not all filesystems reliably result in a positive hashed dentry: - NFS, cifs, hostfs will sometimes need to perform a lookup of the name to get inode information. Races could result in this returning something different. Note that this lookup is non-atomic which is what we are trying to avoid. Placing the lookup in filesystem code means it only happens when the filesystem has no other option. - kernfs and tracefs leave the dentry negative and the ->revalidate operation ensures that lookup will be called to correctly populate the dentry. This could be fixed but I don't think it is important to any of the users of vfs_mkdir() which look at the dentry. The recommendation to use d_drop();d_splice_alias() is ugly but fits with current practice. A planned future patch will change this. Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: NeilBrown <neilb@suse.de> Link: https://lore.kernel.org/r/20250227013949.536172-2-neilb@suse.de Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.14-rc5). Conflicts: drivers/net/ethernet/cadence/macb_main.c fa52f15c745c ("net: cadence: macb: Synchronize stats calculations") 75696dd0fd72 ("net: cadence: macb: Convert to get_stats64") https://lore.kernel.org/20250224125848.68ee63e5@canb.auug.org.au Adjacent changes: drivers/net/ethernet/intel/ice/ice_sriov.c 79990cf5e7ad ("ice: Fix deinitializing VF in error path") a203163274a4 ("ice: simplify VF MSI-X managing") net/ipv4/tcp.c 18912c520674 ("tcp: devmem: don't write truncated dmabuf CMSGs to userspace") 297d389e9e5b ("net: prefix devmem specific helpers") net/mptcp/subflow.c 8668860b0ad3 ("mptcp: reset when MPTCP opts are dropped after join") c3349a22c200 ("mptcp: consolidate subflow cleanup") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-27mm: hugetlb: Add huge page size param to huge_ptep_get_and_clear()Ryan Roberts
In order to fix a bug, arm64 needs to be told the size of the huge page for which the huge_pte is being cleared in huge_ptep_get_and_clear(). Provide for this by adding an `unsigned long sz` parameter to the function. This follows the same pattern as huge_pte_clear() and set_huge_pte_at(). This commit makes the required interface modifications to the core mm as well as all arches that implement this function (arm64, loongarch, mips, parisc, powerpc, riscv, s390, sparc). The actual arm64 bug will be fixed in a separate commit. Cc: stable@vger.kernel.org Fixes: 66b3923a1a0f ("arm64: hugetlb: add support for PTE contiguous bit") Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> # riscv Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> # s390 Link: https://lore.kernel.org/r/20250226120656.2400136-2-ryan.roberts@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-02-27bpf: Use try_alloc_pages() to allocate pages for bpf needs.Alexei Starovoitov
Use try_alloc_pages() and free_pages_nolock() for BPF needs when context doesn't allow using normal alloc_pages. This is a prerequisite for further work. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/r/20250222024427.30294-7-alexei.starovoitov@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-02-27mm, bpf: Introduce free_pages_nolock()Alexei Starovoitov
Introduce free_pages_nolock() that can free pages without taking locks. It relies on trylock and can be called from any context. Since spin_trylock() cannot be used in PREEMPT_RT from hard IRQ or NMI it uses lockless link list to stash the pages which will be freed by subsequent free_pages() from good context. Do not use llist unconditionally. BPF maps continuously allocate/free, so we cannot unconditionally delay the freeing to llist. When the memory becomes free make it available to the kernel and BPF users right away if possible, and fallback to llist as the last resort. Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/r/20250222024427.30294-4-alexei.starovoitov@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-02-27Merge tag 'net-6.14-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bluetooth. We didn't get netfilter or wireless PRs this week, so next week's PR is probably going to be bigger. A healthy dose of fixes for bugs introduced in the current release nonetheless. Current release - regressions: - Bluetooth: always allow SCO packets for user channel - af_unix: fix memory leak in unix_dgram_sendmsg() - rxrpc: - remove redundant peer->mtu_lock causing lockdep splats - fix spinlock flavor issues with the peer record hash - eth: iavf: fix circular lock dependency with netdev_lock - net: use rtnl_net_dev_lock() in register_netdevice_notifier_dev_net() RDMA driver register notifier after the device Current release - new code bugs: - ethtool: fix ioctl confusing drivers about desired HDS user config - eth: ixgbe: fix media cage present detection for E610 device Previous releases - regressions: - loopback: avoid sending IP packets without an Ethernet header - mptcp: reset connection when MPTCP opts are dropped after join Previous releases - always broken: - net: better track kernel sockets lifetime - ipv6: fix dst ref loop on input in seg6 and rpl lw tunnels - phy: qca807x: use right value from DTS for DAC_DSP_BIAS_CURRENT - eth: enetc: number of error handling fixes - dsa: rtl8366rb: reshuffle the code to fix config / build issue with LED support" * tag 'net-6.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (53 commits) net: ti: icss-iep: Reject perout generation request idpf: fix checksums set in idpf_rx_rsc() selftests: drv-net: Check if combined-count exists net: ipv6: fix dst ref loop on input in rpl lwt net: ipv6: fix dst ref loop on input in seg6 lwt usbnet: gl620a: fix endpoint checking in genelink_bind() net/mlx5: IRQ, Fix null string in debug print net/mlx5: Restore missing trace event when enabling vport QoS net/mlx5: Fix vport QoS cleanup on error net: mvpp2: cls: Fixed Non IP flow, with vlan tag flow defination. af_unix: Fix memory leak in unix_dgram_sendmsg() net: Handle napi_schedule() calls from non-interrupt net: Clear old fragment checksum value in napi_reuse_skb gve: unlink old napi when stopping a queue using queue API net: Use rtnl_net_dev_lock() in register_netdevice_notifier_dev_net(). tcp: Defer ts_recent changes until req is owned net: enetc: fix the off-by-one issue in enetc_map_tx_tso_buffs() net: enetc: remove the mm_lock from the ENETC v4 driver net: enetc: add missing enetc4_link_deinit() net: enetc: update UDP checksum when updating originTimestamp field ...
2025-02-27mm, bpf: Introduce try_alloc_pages() for opportunistic page allocationAlexei Starovoitov
Tracing BPF programs execute from tracepoints and kprobes where running context is unknown, but they need to request additional memory. The prior workarounds were using pre-allocated memory and BPF specific freelists to satisfy such allocation requests. Instead, introduce gfpflags_allow_spinning() condition that signals to the allocator that running context is unknown. Then rely on percpu free list of pages to allocate a page. try_alloc_pages() -> get_page_from_freelist() -> rmqueue() -> rmqueue_pcplist() will spin_trylock to grab the page from percpu free list. If it fails (due to re-entrancy or list being empty) then rmqueue_bulk()/rmqueue_buddy() will attempt to spin_trylock zone->lock and grab the page from there. spin_trylock() is not safe in PREEMPT_RT when in NMI or in hard IRQ. Bailout early in such case. The support for gfpflags_allow_spinning() mode for free_page and memcg comes in the next patches. This is a first step towards supporting BPF requirements in SLUB and getting rid of bpf_mem_alloc. That goal was discussed at LSFMM: https://lwn.net/Articles/974138/ Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/r/20250222024427.30294-3-alexei.starovoitov@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-02-27locking/local_lock: Introduce localtry_lock_tSebastian Andrzej Siewior
In !PREEMPT_RT local_lock_irqsave() disables interrupts to protect critical section, but it doesn't prevent NMI, so the fully reentrant code cannot use local_lock_irqsave() for exclusive access. Introduce localtry_lock_t and localtry_lock_irqsave() that disables interrupts and sets acquired=1, so localtry_lock_irqsave() from NMI attempting to acquire the same lock will return false. In PREEMPT_RT local_lock_irqsave() maps to preemptible spin_lock(). Map localtry_lock_irqsave() to preemptible spin_trylock(). When in hard IRQ or NMI return false right away, since spin_trylock() is not safe due to explicit locking in the underneath rt_spin_trylock() implementation. Removing this explicit locking and attempting only "trylock" is undesired due to PI implications. Note there is no need to use local_inc for acquired variable, since it's a percpu variable with strict nesting scopes. Acked-by: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Link: https://lore.kernel.org/r/20250222024427.30294-2-alexei.starovoitov@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-02-27io_uring/nvme: pass issue_flags to io_uring_cmd_import_fixed()Pavel Begunkov
io_uring_cmd_import_fixed() will need to know the io_uring execution state in following commits, for now just pass issue_flags into it without actually using. Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250224213116.3509093-5-kbusch@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-02-27drivers: base: component: add function to query the bound statusHeiko Stuebner
The component helpers already expose the bound status in debugfs, but at times it might be necessary to also check that state in the kernel and act differently depending on the result. For example the shutdown handler of a drm-driver might need to stop a whole output pipeline if the drm device is up and running, but may run into problems if that drm-device has never been set up before, for example because the binding deferred. So add a little helper that returns the bound status for a componet device. Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Heiko Stuebner <heiko@sntech.de> Link: https://patchwork.freedesktop.org/patch/msgid/20250220234141.2788785-2-heiko@sntech.de
2025-02-27regcache: Add support for sorting defaults arraysCharles Keepax
The defaults array in regcache must be sorted into ascending register address order, because binary search is used to locate values in the array. Add a helper to sort the register defaults array which can be useful for systems that dynamically create a defaults array based on external information. Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com> Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.dev> Link: https://patch.msgid.link/20250217140159.2288784-2-ckeepax@opensource.cirrus.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-02-27net: skbuff: introduce napi_skb_cache_get_bulk()Alexander Lobakin
Add a function to get an array of skbs from the NAPI percpu cache. It's supposed to be a drop-in replacement for kmem_cache_alloc_bulk(skbuff_head_cache, GFP_ATOMIC) and xdp_alloc_skb_bulk(GFP_ATOMIC). The difference (apart from the requirement to call it only from the BH) is that it tries to use as many NAPI cache entries for skbs as possible, and allocate new ones only if needed. The logic is as follows: * there is enough skbs in the cache: decache them and return to the caller; * not enough: try refilling the cache first. If there is now enough skbs, return; * still not enough: try allocating skbs directly to the output array with %GFP_ZERO, maybe we'll be able to get some. If there's now enough, return; * still not enough: return as many as we were able to obtain. Most of times, if called from the NAPI polling loop, the first one will be true, sometimes (rarely) the second one. The third and the fourth -- only under heavy memory pressure. It can save significant amounts of CPU cycles if there are GRO cycles and/or Tx completion cycles (anything that descends to napi_skb_cache_put()) happening on this CPU. Tested-by: Daniel Xu <dxu@dxuuu.xyz> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-27net: gro: decouple GRO from the NAPI layerAlexander Lobakin
In fact, these two are not tied closely to each other. The only requirements to GRO are to use it in the BH context and have some sane limits on the packet batches, e.g. NAPI has a limit of its budget (64/8/etc.). Move purely GRO fields into a new structure, &gro_node. Embed it into &napi_struct and adjust all the references. gro_node::cached_napi_id is effectively the same as napi_struct::napi_id, but to be used on GRO hotpath to mark skbs. napi_struct::napi_id is now a fully control path field. Three Ethernet drivers use napi_gro_flush() not really meant to be exported, so move it to <net/gro.h> and add that include there. napi_gro_receive() is used in more than 100 drivers, keep it in <linux/netdevice.h>. This does not make GRO ready to use outside of the NAPI context yet. Tested-by: Daniel Xu <dxu@dxuuu.xyz> Acked-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-27iomap: make buffered writes work with RWF_DONTCACHEJens Axboe
Add iomap buffered write support for RWF_DONTCACHE. If RWF_DONTCACHE is set for a write, mark the folios being written as uncached. Then writeback completion will drop the pages. The write_iter handler simply kicks off writeback for the pages, and writeback completion will take care of the rest. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/20250204184047.356762-2-axboe@kernel.dk Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-27Merge patch series "prep patches for my mkdir series"Christian Brauner
NeilBrown <neilb@suse.de> says: These two patches are cleanup are dependencies for my mkdir changes and subsequence directory locking changes. * patches from https://lore.kernel.org/r/20250226062135.2043651-1-neilb@suse.de: (2 commits) nfsd: drop fh_update() from S_IFDIR branch of nfsd_create_locked() nfs/vfs: discard d_exact_alias() Link: https://lore.kernel.org/r/20250226062135.2043651-1-neilb@suse.de Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-26net: move aRFS rmap management and CPU affinity to coreAhmed Zaki
A common task for most drivers is to remember the user-set CPU affinity to its IRQs. On each netdev reset, the driver should re-assign the user's settings to the IRQs. Unify this task across all drivers by moving the CPU affinity to napi->config. However, to move the CPU affinity to core, we also need to move aRFS rmap management since aRFS uses its own IRQ notifiers. For the aRFS, add a new netdev flag "rx_cpu_rmap_auto". Drivers supporting aRFS should set the flag via netif_enable_cpu_rmap() and core will allocate and manage the aRFS rmaps. Freeing the rmap is also done by core when the netdev is freed. For better IRQ affinity management, move the IRQ rmap notifier inside the napi_struct and add new notify.notify and notify.release functions: netif_irq_cpu_rmap_notify() and netif_napi_affinity_release(). Now we have the aRFS rmap management in core, add CPU affinity mask to napi_config. To delegate the CPU affinity management to the core, drivers must: 1 - set the new netdev flag "irq_affinity_auto": netif_enable_irq_affinity(netdev) 2 - create the napi with persistent config: netif_napi_add_config() 3 - bind an IRQ to the napi instance: netif_napi_set_irq() the core will then make sure to use re-assign affinity to the napi's IRQ. The default IRQ mask is set to one cpu starting from the closest NUMA. Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Link: https://patch.msgid.link/20250224232228.990783-2-ahmed.zaki@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-26net: skb: free up one bit in tx_flagsWillem de Bruijn
The linked series wants to add skb tx completion timestamps. That needs a bit in skb_shared_info.tx_flags, but all are in use. A per-skb bit is only needed for features that are configured on a per packet basis. Per socket features can be read from sk->sk_tsflags. Per packet tsflags can be set in sendmsg using cmsg, but only those in SOF_TIMESTAMPING_TX_RECORD_MASK. Per packet tsflags can also be set without cmsg by sandwiching a send inbetween two setsockopts: val |= SOF_TIMESTAMPING_$FEATURE; setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, sizeof(val)); write(fd, buf, sz); val &= ~SOF_TIMESTAMPING_$FEATURE; setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, sizeof(val)); Changing a datapath test from skb_shinfo(skb)->tx_flags to skb->sk->sk_tsflags can change behavior in that case, as the tx_flags is written before the second setsockopt updates sk_tsflags. Therefore, only bits can be reclaimed that cannot be set by cmsg and are also highly unlikely to be used to target individual packets otherwise. Free up the bit currently used for SKBTX_HW_TSTAMP_USE_CYCLES. This selects between clock and free running counter source for HW TX timestamps. It is probable that all packets of the same socket will always use the same source. Link: https://lore.kernel.org/netdev/cover.1739988644.git.pav@iki.fi/ Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Gerhard Engleder <gerhard@engleder-embedded.com> Link: https://patch.msgid.link/20250225023416.2088705-1-willemdebruijn.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-26tcp: be less liberal in TSEcr received while in SYN_RECV stateEric Dumazet
Yong-Hao Zou mentioned that linux was not strict as other OS in 3WHS, for flows using TCP TS option (RFC 7323) As hinted by an old comment in tcp_check_req(), we can check the TSEcr value in the incoming packet corresponds to one of the SYNACK TSval values we have sent. In this patch, I record the oldest and most recent values that SYNACK packets have used. Send a challenge ACK if we receive a TSEcr outside of this range, and increase a new SNMP counter. nstat -az | grep TSEcrRejected TcpExtTSEcrRejected 0 0.0 Due to TCP fastopen implementation, do not apply yet these checks for fastopen flows. v2: No longer use req->num_timeout, but treq->snt_tsval_first to detect when first SYNACK is prepared. This means we make sure to not send an initial zero TSval. Make sure MPTCP and TCP selftests are passing. Change MIB name to TcpExtTSEcrRejected v1: https://lore.kernel.org/netdev/CADVnQykD8i4ArpSZaPKaoNxLJ2if2ts9m4As+=Jvdkrgx1qMHw@mail.gmail.com/T/ Reported-by: Yong-Hao Zou <yonghaoz1994@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250225171048.3105061-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-26KVM: arm64: Specify hypercall ABI for retrieving target implementationsShameer Kolothum
If the Guest requires migration to multiple targets, these hypercalls will provide a way to retrieve the target CPU implementations from the user space VMM. Subsequent patch will use this to enable the associated errata. Suggested-by: Oliver Upton <oliver.upton@linux.dev> Suggested-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Sebastian Ott <sebott@redhat.com> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Link: https://lore.kernel.org/r/20250221140229.12588-3-shameerali.kolothum.thodi@huawei.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-02-26Merge tag 'nfs-for-6.14-2' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds
Pull NFS client fixes from Anna Schumaker: "Stable Fixes: - O_DIRECT writes should adjust file length Other Bugfixes: - Adjust delegated timestamps for O_DIRECT reads and writes - Prevent looping due to rpc_signal_task() races - Fix a deadlock when recovering state on a sillyrenamed file - Properly handle -ETIMEDOUT errors from tlshd - Suppress build warnings for unused procfs functions - Fix memory leak of lsm_contexts" * tag 'nfs-for-6.14-2' of git://git.linux-nfs.org/projects/anna/linux-nfs: lsm,nfs: fix memory leak of lsm_context sunrpc: suppress warnings for unused procfs functions SUNRPC: Handle -ETIMEDOUT return from tlshd NFSv4: Fix a deadlock when recovering state on a sillyrenamed file SUNRPC: Prevent looping due to rpc_signal_task() races NFS: Adjust delegated timestamps for O_DIRECT reads and writes NFS: O_DIRECT writes must check and adjust the file length
2025-02-26of: Align macro MAX_PHANDLE_ARGS with NR_FWNODE_REFERENCE_ARGSZijun Hu
Macro NR_FWNODE_REFERENCE_ARGS defines the maximal argument count for firmware node reference, and MAX_PHANDLE_ARGS defines the maximal argument count for DT node reference, both have the same value now. To void argument count inconsistency between firmware and DT, simply align both macros by '#define MAX_PHANDLE_ARGS NR_FWNODE_REFERENCE_ARGS'. Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com> Reviewed-by: Sakari Ailus <sakari.ailus@linux.intel.com> Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20250225-fix_arg_count-v4-2-13cdc519eb31@quicinc.com Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
2025-02-26of: property: Increase NR_FWNODE_REFERENCE_ARGSZijun Hu
Currently, the following two macros have different values: // The maximal argument count for firmware node reference #define NR_FWNODE_REFERENCE_ARGS 8 // The maximal argument count for DT node reference #define MAX_PHANDLE_ARGS 16 It may cause firmware node reference's argument count out of range if directly assign DT node reference's argument count to firmware's. drivers/of/property.c:of_fwnode_get_reference_args() is doing the direct assignment, so may cause firmware's argument count @args->nargs got out of range, namely, in [9, 16]. Fix by increasing NR_FWNODE_REFERENCE_ARGS to 16 to meet DT requirement. Will align both macros later to avoid such inconsistency. Fixes: 3e3119d3088f ("device property: Introduce fwnode_property_get_reference_args") Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com> Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com> Link: https://lore.kernel.org/r/20250225-fix_arg_count-v4-1-13cdc519eb31@quicinc.com Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
2025-02-26perf: Remove unnecessary parameter of security checkLuo Gengkun
It seems that the attr parameter was never been used in security checks since it was first introduced by: commit da97e18458fb ("perf_event: Add support for LSM and SELinux checks") so remove it. Signed-off-by: Luo Gengkun <luogengkun@huaweicloud.com> Reviewed-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-02-26KVM: Drop kvm_arch_sync_events() now that all implementations are nopsSean Christopherson
Remove kvm_arch_sync_events() now that x86 no longer uses it (no other arch has ever used it). No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Acked-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Bibo Mao <maobibo@loongson.cn> Message-ID: <20250224235542.2562848-8-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-02-26PM: clk: remove unused of_pm_clk_add_clk()Dr. David Alan Gilbert
The last use of of_pm_clk_add_clk() was removed by 2019's commit fe00f8900ca7 ("irqchip/gic-pm: Update driver to use clk_bulk APIs") Remove it. Note that the plural version of_pm_clk_add_clks() is still being used and is left. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Link: https://patch.msgid.link/20250224010610.187503-1-linux@treblig.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-02-26x86/cfi: Add 'cfi=warn' boot optionPeter Zijlstra
Rebuilding with CONFIG_CFI_PERMISSIVE=y enabled is such a pain, esp. since clang is so slow. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Kees Cook <kees@kernel.org> Link: https://lore.kernel.org/r/20250224124159.924496481@infradead.org
2025-02-26gpiolib: introduce gpio_chip setters that return valuesBartosz Golaszewski
Add new variants of the set() and set_multiple() callbacks that have integer return values allowing to indicate failures to users of the GPIO consumer API. Until we convert all GPIO providers treewide to using them, they will live in parallel to the existing ones. Make sure that providers cannot define both. Prefer the new ones and only use the old ones as fallback. Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Link: https://lore.kernel.org/r/20250220-gpio-set-retval-v2-5-bc4cfd38dae3@linaro.org Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2025-02-26gpiolib: make value setters have return valuesBartosz Golaszewski
Change the in-kernel consumer interface for GPIOs: make all variants of value setters that don't have a return value, return a signed integer instead. That will allow these routines to indicate failures to callers. This doesn't change the implementation just yet, we'll do it in subsequent commits. We need to update the gpio-latch module as it passes the address of value setters as a function pointer argument and thus cares about its type. Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Link: https://lore.kernel.org/r/20250220-gpio-set-retval-v2-2-bc4cfd38dae3@linaro.org Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2025-02-26EDAC: Update memory repair control interface for memory sparing featureShiju Jose
Update memory repair control interface for memory sparing feature. CXL memory devices can support soft and hard memory sparing at cacheline, row, bank and rank granularities. Memory sparing is defined as a repair function that replaces a portion of memory with a portion of functional memory at that same granularity. When a CXL device detects an error in memory, it will report to the host that there's need for a repair maintenance operation by using an event record where the "maintenance needed" flag is set. The event records contain the device physical address (DPA) and other attributes of the memory to repair such as bank group, bank, rank, row, column, channel etc. The kernel will report the corresponding CXL general media or DRAM trace event to userspace, and userspace tools (e.g. rasdaemon) will initiate a repair operation in response to the device request via the sysfs repair control. [ bp: Massage. ] Signed-off-by: Shiju Jose <shiju.jose@huawei.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20250212143654.1893-15-shiju.jose@huawei.com
2025-02-26gpiolib: use the required minimum set of headersBartosz Golaszewski
Andy suggested we should keep a fine-grained scheme for includes and only pull in stuff required within individual ifdef sections. Let's revert commit dea69f2d1cc8 ("gpiolib: move all includes to the top of gpio/consumer.h") and make the headers situation even more fine-grained by only including the first level headers containing requireded symbols except for bug.h where checkpatch.pl warns against including asm/bug.h. Fixes: dea69f2d1cc8 ("gpiolib: move all includes to the top of gpio/consumer.h") Suggested-by: Andy Shevchenko <andriy.shevchenko@intel.com> Closes: https://lore.kernel.org/all/Z7XPcYtaA4COHDYj@smile.fi.intel.com/ Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com> Link: https://lore.kernel.org/r/20250225095210.25910-1-brgl@bgdev.pl Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2025-02-26EDAC: Add a memory repair control featureShiju Jose
Add a generic EDAC memory repair control driver to manage memory repairs in the system, such as CXL Post Package Repair (PPR) and other soft and hard PPR features. For example, a CXL device with DRAM components that support PPR features may implement PPR maintenance operations. DRAM components may support two types of PPR: - hard PPR, for a permanent row repair, and - soft PPR, for a temporary row repair. Soft PPR is much faster than hard PPR, but the repair is lost with a power cycle. When a CXL device detects an error in a memory, it may report the need for a repair maintenance operation by using an event record where the "maintenance needed" flag is set. The event records contain the device physical address (DPA) and other optional attributes of the memory to repair. The kernel will report the corresponding CXL general media or DRAM trace event to userspace, and userspace tools (e.g. rasdaemon) will initiate a repair operation in response to the device request via the sysfs repair control. Device with memory repair features registers with EDAC device driver, which retrieves a memory repair descriptor from EDAC memory repair driver and exposes the sysfs repair control attributes to userspace in /sys/bus/edac/devices/<dev-name>/mem_repairX/. The common memory repair control interface abstracts the control of arbitrary memory repair functionality into a standardized set of functions. The sysfs memory repair attribute nodes are only available if the client driver has implemented the corresponding attribute callback function and provided operations to the EDAC device driver during registration. [ bp: Massage, fixup edac_dev_register() retvals, merge write_overflow fix to mem_repair_create_desc() ] Signed-off-by: Shiju Jose <shiju.jose@huawei.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20250212143654.1893-5-shiju.jose@huawei.com
2025-02-26nfs/vfs: discard d_exact_alias()NeilBrown
d_exact_alias() is a descendent of d_add_unique() which was introduced 20 years ago mostly likely to work around problems with NFS servers of the time. It is now not used in several situations were it was originally needed and there have been no reports of problems - presumably the old NFS servers have been improved. This only place it is now use is in NFSv4 code and the old problematic servers are thought to have been v2/v3 only. There is no clear benefit in reusing a unhashed() dentry which happens to have the same name as the dentry we are adding. So this patch removes d_exact_alias() and the one place that it is used. Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: NeilBrown <neilb@suse.de> Link: https://lore.kernel.org/r/20250226062135.2043651-2-neilb@suse.de Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-26iomap: introduce a full map advance helperBrian Foster
Various iomap_iter_advance() calls advance by the full mapping length and thus have no need for the current length input or post-advance remaining length output from the standard advance function. Add an iomap_iter_advance_full() helper to clean up these cases. Signed-off-by: Brian Foster <bfoster@redhat.com> Link: https://lore.kernel.org/r/20250224144757.237706-13-bfoster@redhat.com Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>