linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2025-03-20	Merge branch 'kvm-nvmx-and-vm-teardown' into HEAD	Paolo Bonzini
	The immediate issue being fixed here is a nVMX bug where KVM fails to detect that, after nested VM-Exit, L1 has a pending IRQ (or NMI). However, checking for a pending interrupt accesses the legacy PIC, and x86's kvm_arch_destroy_vm() currently frees the PIC before destroying vCPUs, i.e. checking for IRQs during the forced nested VM-Exit results in a NULL pointer deref; that's a prerequisite for the nVMX fix. The remaining patches attempt to bring a bit of sanity to x86's VM teardown code, which has accumulated a lot of cruft over the years. E.g. KVM currently unloads each vCPU's MMUs in a separate operation from destroying vCPUs, all because when guest SMP support was added, KVM had a kludgy MMU teardown flow that broke when a VM had more than one 1 vCPU. And that oddity lived on, for 18 years... Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-03-20	Merge tag 'kvmarm-6.15' of ↵	Paolo Bonzini
	https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for 6.15 - Nested virtualization support for VGICv3, giving the nested hypervisor control of the VGIC hardware when running an L2 VM - Removal of 'late' nested virtualization feature register masking, making the supported feature set directly visible to userspace - Support for emulating FEAT_PMUv3 on Apple silicon, taking advantage of an IMPLEMENTATION DEFINED trap that covers all PMUv3 registers - Paravirtual interface for discovering the set of CPU implementations where a VM may run, addressing a longstanding issue of guest CPU errata awareness in big-little systems and cross-implementation VM migration - Userspace control of the registers responsible for identifying a particular CPU implementation (MIDR_EL1, REVIDR_EL1, AIDR_EL1), allowing VMs to be migrated cross-implementation - pKVM updates, including support for tracking stage-2 page table allocations in the protected hypervisor in the 'SecPageTable' stat - Fixes to vPMU, ensuring that userspace updates to the vPMU after KVM_RUN are reflected into the backing perf events
2025-03-20	cgroup: rstat: Cleanup flushing functions and locking	Yosry Ahmed
	Now that the rstat lock is being re-acquired on every CPU iteration in cgroup_rstat_flush_locked(), having the initially acquire the lock is unnecessary and unclear. Inline cgroup_rstat_flush_locked() into cgroup_rstat_flush() and move the lock/unlock calls to the beginning and ending of the loop body to make the critical section obvious. cgroup_rstat_flush_hold/release() do not make much sense with the lock being dropped and reacquired internally. Since it has no external callers, remove it and explicitly acquire the lock in cgroup_base_stat_cputime_show() instead. This leaves the code with a single flushing function, cgroup_rstat_flush(). Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-03-20	Merge tag 'net-6.14-rc8' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from can, bluetooth and ipsec. This contains a last minute revert of a recent GRE patch, mostly to allow me stating there are no known regressions outstanding. Current release - regressions: - revert "gre: Fix IPv6 link-local address generation." - eth: ti: am65-cpsw: fix NAPI registration sequence Previous releases - regressions: - ipv6: fix memleak of nhc_pcpu_rth_output in fib_check_nh_v6_gw(). - mptcp: fix data stream corruption in the address announcement - bluetooth: fix connection regression between LE and non-LE adapters - can: - flexcan: only change CAN state when link up in system PM - ucan: fix out of bound read in strscpy() source Previous releases - always broken: - lwtunnel: fix reentry loops - ipv6: fix TCP GSO segmentation with NAT - xfrm: force software GSO only in tunnel mode - eth: ti: icssg-prueth: add lock to stats Misc: - add Andrea Mayer as a maintainer of SRv6" * tag 'net-6.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (33 commits) MAINTAINERS: Add Andrea Mayer as a maintainer of SRv6 Revert "gre: Fix IPv6 link-local address generation." Revert "selftests: Add IPv6 link-local address generation tests for GRE devices." net/neighbor: add missing policy for NDTPA_QUEUE_LENBYTES tools headers: Sync uapi/asm-generic/socket.h with the kernel sources mptcp: Fix data stream corruption in the address announcement selftests: net: test for lwtunnel dst ref loops net: ipv6: ioam6: fix lwtunnel_output() loop net: lwtunnel: fix recursion loops net: ti: icssg-prueth: Add lock to stats net: atm: fix use after free in lec_send() xsk: fix an integer overflow in xp_create_and_assign_umem() net: stmmac: dwc-qos-eth: use devm_kzalloc() for AXI data selftests: drv-net: use defer in the ping test phy: fix xa_alloc_cyclic() error handling dpll: fix xa_alloc_cyclic() error handling devlink: fix xa_alloc_cyclic() error handling ipv6: Set errno after ip_fib_metrics_init() in ip6_route_info_create(). ipv6: Fix memleak of nhc_pcpu_rth_output in fib_check_nh_v6_gw(). net: ipv6: fix TCP GSO segmentation with NAT ...
2025-03-20	ASoC: wm8904: Add DMIC and DRC support	Mark Brown
	Merge series from Francesco Dolcini <francesco@dolcini.it>: This patch series adds DMIC and DRC support to the WM8904 driver, a new of_ helper is added to simplify the driver code. DRC functionality is added in the same patch series to provide the necessary dynamic range control to make DMIC support useful. The WM8904 supports digital microphones on two of its inputs: IN1L/DMICDAT1 and IN1R/DMICDAT2. These two inputs can either be connected to an ADC or to the DMIC system. There is an ADC for each line, and only one DMIC block. This DMIC block is either connected to DMICDAT1 or to DMICDAT2. One DMIC data line supports two digital microphones via time multiplexing. The pin's functionality is decided during hardware design (IN1L vs DMICDAT1 and IN1R vs DMICDAT2). This is reflected in the Device Tree. If one line is analog and one is DMIC, we need to be able to switch between ADC and DMIC at runtime. The DMIC source is known from the Device Tree. If both are DMIC inputs, we need to be able to switch the DMIC source. There is no need to switch between ADC and DMIC at runtime. Therefore, kcontrols are dynamically added by the driver depending on its Device Tree configuration. This is a heavy rework of a previous patch series provided by Alifer Moraes and Pierluigi Passaro, https://lore.kernel.org/lkml/20220307141041.27538-1-alifer.m@variscite.com.
2025-03-20	tty: serdev: drop serdev_controller_ops::write_room()	Jiri Slaby (SUSE)
	In particular, serdev_device_write_room() is not called, so the whole serdev's write_room() can go. Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org> Cc: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20250317070046.24386-17-jirislaby@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-20	tty: tty_driver: introduce TTY driver sub/types enums	Jiri Slaby (SUSE)
	Convert TTY_DRIVER_TYPE_*, and subtype macros to two enums: tty_driver_type and tty_driver_subtype. This allows for easier kernel-doc (later), grouping of these nicely, and proper checking. The tty_driver's ::type and ::subtype now use these enums instead of bare "short". Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org> Link: https://lore.kernel.org/r/20250317070046.24386-16-jirislaby@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-20	tty: tty_driver: document both {,__}tty_alloc_driver() properly	Jiri Slaby (SUSE)
	__tty_alloc_driver()'s kernel-doc needed some care: describe the return value using the standard "Returns:", and use the new enum tty_driver_flag for @flags. Then, the tty_alloc_driver() macro was undocumented, but referenced many times in the docs. Copy the docs from the above (except the @owner parameter, obviously). Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org> Link: https://lore.kernel.org/r/20250317070046.24386-15-jirislaby@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-20	tty: tty_driver: convert "TTY Driver Flags" to an enum	Jiri Slaby (SUSE)
	Convert TTY_DRIVER_* macros (flags) to an enum. This allows for easier kernel-doc (the comment needed fine tuning), grouping of these nicely, and proper checking. Given these are flags, define them using modern BIT() instead of hex constants. It turns out (thanks, kernel-doc checker) that internal TTY_DRIVER_INSTALLED was undocumented. Fix that too. Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org> Link: https://lore.kernel.org/r/20250317070046.24386-14-jirislaby@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-20	tty: tty_driver: move TTY macros to the top	Jiri Slaby (SUSE)
	So that they can be referenced in structs once converted to enums (in the next patches). Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org> Link: https://lore.kernel.org/r/20250317070046.24386-13-jirislaby@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-20	tty: move N_TTY_BUF_SIZE to n_tty	Jiri Slaby (SUSE)
	"N_TTY_BUF_SIZE" is private to n_tty and shall not be exposed to the world. Definitely not in tty.h somewhere in the middle of "struct tty_struct". This is a remnant of moving "read_flags" to "struct n_tty_data" in commit 3fe780b379fa ("TTY: move ldisc data from tty_struct: bitmaps"). But some cleanup was needed first (in previous patches). Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org> Link: https://lore.kernel.org/r/20250317070046.24386-5-jirislaby@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-20	tty: convert "TTY Struct Flags" to an enum	Jiri Slaby (SUSE)
	Convert TTY_* macros (flags) to an enum. This allows for easier kernel-doc (the comment needed fine tuning), grouping of these nicely, and proper checking. Note that these are bit positions. So they are used such as test_bit(TTY_THROTTLED, ...). Given these are not the user API (only in-kernel API/ABI), the bit positions are NOT preserved in this patch. All are renumbered naturally using the enum-auto-numbering. Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org> Link: https://lore.kernel.org/r/20250317070046.24386-2-jirislaby@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-20	iomap: rework IOMAP atomic flags	John Garry
	Flag IOMAP_ATOMIC_SW is not really required. The idea of having this flag is that the FS ->iomap_begin callback could check if this flag is set to decide whether to do a SW (FS-based) atomic write. But the FS can set which ->iomap_begin callback it wants when deciding to do a FS-based atomic write. Furthermore, it was thought that IOMAP_ATOMIC_HW is not a proper name, as the block driver can use SW-methods to emulate an atomic write. So change back to IOMAP_ATOMIC. The ->iomap_begin callback needs though to indicate to iomap core that REQ_ATOMIC needs to be set, so add IOMAP_F_ATOMIC_BIO for that. These changes were suggested by Christoph Hellwig and Dave Chinner. Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20250320120250.4087011-4-john.g.garry@oracle.com Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-20	Merge tag 'coresight-next-v6.15' of ↵	Greg Kroah-Hartman
	ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/coresight/linux into char-misc-next Suzuki writes: coresight: updates for Linux v6.15 CoreSight self-hosted tracing driver subsystem update for Linux v6.15. The update includes: - CoreSight trace capture for Panic/Watchdog timeouts - Fixes to ETM4x driver to synchronize register reads as required by the TRM - Support for Qualcomm CoreSight TMC Control Unit driver - Conversion of device locks to raw_spinlock for components that are used by the Perf mode. - Miscellaneous fixes for the subsystem Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> * tag 'coresight-next-v6.15' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/coresight/linux: (41 commits) Coresight: Fix a NULL vs IS_ERR() bug in probe coresight: configfs: Constify struct config_item_type coresight: docs: Remove target sink from examples coresight/ultrasoc: change smb_drv_data spinlock's type to raw_spinlock_t coresight-tmc: change tmc_drvdata spinlock's type to raw_spinlock_t coresight-replicator: change replicator_drvdata spinlock's type to raw_spinlock_t coresight-funnel: change funnel_drvdata spinlock's type to raw_spinlock_t coresight-etb10: change etb_drvdata spinlock's type to raw_spinlock_t coresight-cti: change cti_drvdata spinlock's type to raw_spinlock_t coresight: change coresight_trace_id_map's lock type to raw_spinlock_t coresight-etm4x: change etmv4_drvdata spinlock type to raw_spinlock_t coresight: change coresight_device lock type to raw_spinlock_t coresight: add verification process for coresight_etm_get_trace_id Coresight: Add Coresight TMC Control Unit driver dt-bindings: arm: Add Coresight TMC Control Unit hardware Coresight: Change functions to accept the coresight_path Coresight: Change to read the trace ID from coresight_path Coresight: Allocate trace ID after building the path Coresight: Introduce a new struct coresight_path Coresight: Use coresight_etm_get_trace_id() in traceid_show() ...
2025-03-20	ASoC: wm8904: get platform data from DT	Ernest Van Hoecke
	Read in optional codec-specific properties from the device tree. The platform_data structure is not populated when using device trees. This change parses optional dts properties to populate it. - wlf,in1l-as-dmicdat1 - wlf,in1r-as-dmicdat2 - wlf,gpio-cfg - wlf,micbias-cfg - wlf,drc-cfg-regs - wlf,drc-cfg-names - wlf,retune-mobile-cfg-regs - wlf,retune-mobile-cfg-names - wlf,retune-mobile-cfg-hz Datasheet: https://statics.cirrus.com/pubs/proDatasheet/WM8904_Rev4.1.pdf Signed-off-by: Ernest Van Hoecke <ernest.vanhoecke@toradex.com> Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com> Reviewed-by: Charles Keepax <ckeepax@opensource.cirrus.com> Link: https://patch.msgid.link/20250319142059.46692-5-francesco@dolcini.it Signed-off-by: Mark Brown <broonie@kernel.org>
2025-03-20	of: Add of_property_read_u16_index	Ernest Van Hoecke
	There is an of_property_read_u32_index and of_property_read_u64_index. This patch adds a similar helper for u16. Signed-off-by: Ernest Van Hoecke <ernest.vanhoecke@toradex.com> Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Reviewed-by: Charles Keepax <ckeepax@opensource.cirrus.com> Link: https://patch.msgid.link/20250319142059.46692-2-francesco@dolcini.it Signed-off-by: Mark Brown <broonie@kernel.org>
2025-03-20	Merge branch 'slab/for-6.15/kfree_rcu_tiny' into slab/for-next	Vlastimil Babka
	Merge the slab feature branch kfree_rcu_tiny for 6.15: - Move the TINY_RCU kvfree_rcu() implementation from RCU to SLAB subsystem and cleanup its integration.
2025-03-20	mptcp: sysctl: add available_path_managers	Geliang Tang
	Similarly to net.mptcp.available_schedulers, this patch adds a new one net.mptcp.available_path_managers to list the available path managers. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250313-net-next-mptcp-pm-ops-intro-v1-11-f4e4a88efc50@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-20	mptcp: pm: define struct mptcp_pm_ops	Geliang Tang
	In order to allow users to develop their own BPF-based path manager, this patch defines a struct ops "mptcp_pm_ops" for an MPTCP path manager, which contains a set of interfaces. Currently only init() and release() interfaces are included, subsequent patches will add others step by step. Add a set of functions to register, unregister, find and validate a given path manager struct ops. "list" is used to add this path manager to mptcp_pm_list list when it is registered. "name" is used to identify this path manager. mptcp_pm_find() uses "name" to find a path manager on the list. mptcp_pm_unregister is not used in this set, but will be invoked in .unreg of struct bpf_struct_ops. mptcp_pm_validate() will be invoked in .validate of struct bpf_struct_ops. That's why they are exported. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250313-net-next-mptcp-pm-ops-intro-v1-6-f4e4a88efc50@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-20	cpuidle, sched: Use smp_mb__after_atomic() in current_clr_polling()	Yujun Dong
	In architectures that use the polling bit, current_clr_polling() employs smp_mb() to ensure that the clearing of the polling bit is visible to other cores before checking TIF_NEED_RESCHED. However, smp_mb() can be costly. Given that clear_bit() is an atomic operation, replacing smp_mb() with smp_mb__after_atomic() is appropriate. Many architectures implement smp_mb__after_atomic() as a lighter-weight barrier compared to smp_mb(), leading to performance improvements. For instance, on x86, smp_mb__after_atomic() is a no-op. This change eliminates a smp_mb() instruction in the cpuidle wake-up path, saving several CPU cycles and thereby reducing wake-up latency. Architectures that do not use the polling bit will retain the original smp_mb() behavior to ensure that existing dependencies remain unaffected. Signed-off-by: Yujun Dong <yujundong@pascal-lab.net> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20241230141624.155356-1-yujundong@pascal-lab.net
2025-03-20	fs: reduce work in fdget_pos()	Mateusz Guzik
	1. predict the file was found 2. explicitly compare the ref to "one", ignoring the dead zone The latter arguably improves the behavior to begin with. Suppose the count turned bad -- the previously used ref routine is going to check for it and return 0, indicating the count does not necessitate taking ->f_pos_lock. But there very well may be several users. i.e. not paying for special-casing the dead zone improves semantics. While here spell out each condition in a dedicated if statement. This has no effect on generated code. Sizes are as follows (in bytes; gcc 13, x86-64): stock: 321 likely(): 298 likely()+ref: 280 Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/r/20250319215801.1870660-1-mjguzik@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-20	Merge branches 'apple/dart', 'arm/smmu/updates', 'arm/smmu/bindings', ↵	Joerg Roedel
	'rockchip', 's390', 'core', 'intel/vt-d' and 'amd/amd-vi' into next
2025-03-20	net: phy: Support speed selection for PHY loopback	Gerhard Engleder
	phy_loopback() leaves it to the PHY driver to select the speed of the loopback mode. Thus, the speed of the loopback mode depends on the PHY driver in use. Add support for speed selection to phy_loopback() to enable loopback with defined speeds. Ensure that link up is signaled if speed changes as speed is not allowed to change during link up. Link down and up is necessary for a new speed. Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> Link: https://patch.msgid.link/20250312203010.47429-3-gerhard@engleder-embedded.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-20	net: phy: Allow loopback speed selection for PHY drivers	Gerhard Engleder
	PHY drivers support loopback mode, but it is not possible to select the speed of the loopback mode. The speed is chosen by the set_loopback() operation of the PHY driver. Same is valid for genphy_loopback(). There are PHYs that support loopback with different speeds. Extend set_loopback() to make loopback speed selection possible. Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250312203010.47429-2-gerhard@engleder-embedded.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-19	Merge tag 'qcom-drivers-for-6.15' of ↵	Arnd Bergmann
	https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into soc/drivers Qualcomm driver updates for v6.15 Improve the client interface for the Qualcomm ICE driver to avoid leaking references, including fixing the client drivers to call the new function. Adopt str_on_off() helper in AOSS driver and mark non-global servreg QMI element info array in the PDR driver static. * tag 'qcom-drivers-for-6.15' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux: soc: qcom: Do not expose internal servreg_location_entry_ei array soc: qcom: ice: make of_qcom_ice_get() static scsi: ufs: qcom: fix dev reference leaked through of_qcom_ice_get mmc: sdhci-msm: fix dev reference leaked through of_qcom_ice_get soc: qcom: ice: introduce devm_of_qcom_ice_get dt-bindings: soc: qcom: qcom,pmic-glink: Document SM8750 compatible soc: qcom: Use str_enable_disable-like helpers Link: https://lore.kernel.org/r/20250317210158.2025380-1-andersson@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-03-19	Merge branch 'kvm-arm64/pmu-fixes' into kvmarm/next	Oliver Upton
	* kvm-arm64/pmu-fixes: : vPMU fixes for 6.15 courtesy of Akihiko Odaki : : Various fixes to KVM's vPMU implementation, notably ensuring : userspace-directed changes to the PMCs are reflected in the backing perf : events. KVM: arm64: PMU: Reload when resetting KVM: arm64: PMU: Reload when user modifies registers KVM: arm64: PMU: Fix SET_ONE_REG for vPMC regs KVM: arm64: PMU: Assume PMU presence in pmu-emul.c KVM: arm64: PMU: Set raw values from user to PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR} Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-19	Merge branch 'kvm-arm64/writable-midr' into kvmarm/next	Oliver Upton
	* kvm-arm64/writable-midr: : Writable implementation ID registers, courtesy of Sebastian Ott : : Introduce a new capability that allows userspace to set the : ID registers that identify a CPU implementation: MIDR_EL1, REVIDR_EL1, : and AIDR_EL1. Also plug a hole in KVM's trap configuration where : SMIDR_EL1 was readable at EL1, despite the fact that KVM does not : support SME. KVM: arm64: Fix documentation for KVM_CAP_ARM_WRITABLE_IMP_ID_REGS KVM: arm64: Copy MIDR_EL1 into hyp VM when it is writable KVM: arm64: Copy guest CTR_EL0 into hyp VM KVM: selftests: arm64: Test writes to MIDR,REVIDR,AIDR KVM: arm64: Allow userspace to change the implementation ID registers KVM: arm64: Load VPIDR_EL2 with the VM's MIDR_EL1 value KVM: arm64: Maintain per-VM copy of implementation ID regs KVM: arm64: Set HCR_EL2.TID1 unconditionally Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-19	Merge branch 'kvm-arm64/pmuv3-asahi' into kvmarm/next	Oliver Upton
	* kvm-arm64/pmuv3-asahi: : Support PMUv3 for KVM guests on Apple silicon : : Take advantage of some IMPLEMENTATION DEFINED traps available on Apple : parts to trap-and-emulate the PMUv3 registers on behalf of a KVM guest. : Constrain the vPMU to a cycle counter and single event counter, as the : Apple PMU has events that cannot be counted on every counter. : : There is a small new interface between the ARM PMU driver and KVM, where : the PMU driver owns the PMUv3 -> hardware event mappings. arm64: Enable IMP DEF PMUv3 traps on Apple M* KVM: arm64: Provide 1 event counter on IMPDEF hardware drivers/perf: apple_m1: Provide helper for mapping PMUv3 events KVM: arm64: Remap PMUv3 events onto hardware KVM: arm64: Advertise PMUv3 if IMPDEF traps are present KVM: arm64: Compute synthetic sysreg ESR for Apple PMUv3 traps KVM: arm64: Move PMUVer filtering into KVM code KVM: arm64: Use guard() to cleanup usage of arm_pmus_lock KVM: arm64: Drop kvm_arm_pmu_available static key KVM: arm64: Use a cpucap to determine if system supports FEAT_PMUv3 KVM: arm64: Always support SW_INCR PMU event KVM: arm64: Compute PMCEID from arm_pmu's event bitmaps drivers/perf: apple_m1: Support host/guest event filtering drivers/perf: apple_m1: Refactor event select/filter configuration Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-19	Merge branch 'kvm-arm64/pv-cpuid' into kvmarm/next	Oliver Upton
	* kvm-arm64/pv-cpuid: : Paravirtualized implementation ID, courtesy of Shameer Kolothum : : Big-little has historically been a pain in the ass to virtualize. The : implementation ID (MIDR, REVIDR, AIDR) of a vCPU can change at the whim : of vCPU scheduling. This can be particularly annoying when the guest : needs to know the underlying implementation to mitigate errata. : : "Hyperscalers" face a similar scheduling problem, where VMs may freely : migrate between hosts in a pool of heterogenous hardware. And yes, our : server-class friends are equally riddled with errata too. : : In absence of an architected solution to this wart on the ecosystem, : introduce support for paravirtualizing the implementation exposed : to a VM, allowing the VMM to describe the pool of implementations that a : VM may be exposed to due to scheduling/migration. : : Userspace is expected to intercept and handle these hypercalls using the : SMCCC filter UAPI, should it choose to do so. smccc: kvm_guest: Fix kernel builds for 32 bit arm KVM: selftests: Add test for KVM_REG_ARM_VENDOR_HYP_BMAP_2 smccc/kvm_guest: Enable errata based on implementation CPUs arm64: Make _midr_in_range_list() an exported function KVM: arm64: Introduce KVM_REG_ARM_VENDOR_HYP_BMAP_2 KVM: arm64: Specify hypercall ABI for retrieving target implementations arm64: Modify _midr_range() functions to read MIDR/REVIDR internally Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-19	Merge branch 'kvm-arm64/nv-vgic' into kvmarm/next	Oliver Upton
	* kvm-arm64/nv-vgic: : NV VGICv3 support, courtesy of Marc Zyngier : : Support for emulating the GIC hypervisor controls and managing shadow : VGICv3 state for the L1 hypervisor. As part of it, bring in support for : taking IRQs to the L1 and UAPI to manage the VGIC maintenance interrupt. KVM: arm64: nv: Fail KVM init if asking for NV without GICv3 KVM: arm64: nv: Allow userland to set VGIC maintenance IRQ KVM: arm64: nv: Fold GICv3 host trapping requirements into guest setup KVM: arm64: nv: Propagate used_lrs between L1 and L0 contexts KVM: arm64: nv: Request vPE doorbell upon nested ERET to L2 KVM: arm64: nv: Respect virtual HCR_EL2.TWx setting KVM: arm64: nv: Add Maintenance Interrupt emulation KVM: arm64: nv: Handle L2->L1 transition on interrupt injection KVM: arm64: nv: Nested GICv3 emulation KVM: arm64: nv: Sanitise ICH_HCR_EL2 accesses KVM: arm64: nv: Plumb handling of GICv3 EL2 accesses KVM: arm64: nv: Add ICH_*_EL2 registers to vpcu_sysreg KVM: arm64: nv: Load timer before the GIC arm64: sysreg: Add layout for ICH_MISR_EL2 arm64: sysreg: Add layout for ICH_VTR_EL2 arm64: sysreg: Add layout for ICH_HCR_EL2 Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-19	Merge tag 'samsung-drivers-6.15' of ↵	Arnd Bergmann
	https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux into soc/drivers Samsung SoC drivers for v6.15 1. Add support for Exynos USI v1 serial engines. Drivers already supported newer IP blocks - USI v2 - present in Exynos850 and newer. A bit older ARM64 designs, like Exynos8895 use older USI v1 block. 2. Add Exynos ACPM (Alive Clock and Power Manager) protocol driver for Google GS101 SoC. ACPM protocol allows communication between the power management firmware and other embedded processors. 3. Exynos2200: Add PMU, ChipID and SYSREG Devicetree bindings. 4. Exynos7870: Add PMU and ChipID Devicetree bindings. 5. Various cleanups. * tag 'samsung-drivers-6.15' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux: dt-bindings: soc: samsung: exynos-usi: Drop unnecessary status from example soc: samsung: include linux/array_size.h where needed soc: samsung: exynos-chipid: add support for exynos7870 dt-bindings: soc: samsung: exynos-pmu: add exynos7870-pmu compatible dt-bindings: hwinfo: samsung,exynos-chipid: add exynos7870-chipid compatible soc: samsung: exynos-chipid: add exynos2200 SoC support dt-bindings: hwinfo: samsung,exynos-chipid: add exynos2200 compatible dt-bindings: soc: samsung: exynos-pmu: add exynos2200 compatible dt-bindings: soc: samsung: exynos-sysreg: add sysreg compatibles for exynos2200 firmware: Exynos ACPM: Fix spelling mistake "Faile" -> "Failed" MAINTAINERS: add entry for the Samsung Exynos ACPM mailbox protocol firmware: add Exynos ACPM protocol driver dt-bindings: firmware: add google,gs101-acpm-ipc soc: samsung: usi: implement support for USIv1 and exynos8895 soc: samsung: usi: add a routine for unconfiguring the ip dt-bindings: soc: samsung: usi: add USIv1 and samsung,exynos8895-usi soc: samsung: Use syscon_regmap_lookup_by_phandle_args Link: https://lore.kernel.org/r/20250309185601.10616-1-krzysztof.kozlowski@linaro.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-03-19	Merge tag 'tegra-for-6.15-firmware' of ↵	Arnd Bergmann
	https://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux into soc/drivers firmware: tegra: Changes for v6.15-rc1 Not much to this except for a simple typofix. * tag 'tegra-for-6.15-firmware' of https://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux: firmware: tegra: bpmp: Fix typo in bpmp-abi.h Link: https://lore.kernel.org/r/20250307162332.3451523-2-thierry.reding@gmail.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-03-19	sched/debug: Make CONFIG_SCHED_DEBUG functionality unconditional	Ingo Molnar
	All the big Linux distros enable CONFIG_SCHED_DEBUG, because the various features it provides help not just with kernel development, but with system administration and user-space software development as well. Reflect this reality and enable this functionality unconditionally. Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ben Segall <bsegall@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250317104257.3496611-4-mingo@kernel.org
2025-03-19	Merge tag 'v6.15-rockchip-dts64-2' of ↵	Arnd Bergmann
	https://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into soc/dt New peripheral the sdhci controller on rk3528. Enablement of hdmi and hdmi audio on a number of additional boards. Better handling for scmi shared memory on rk3568 and a fix for the used SCMI clock ids on rk3576. As well as some fixes that were a bit late for trying to stuff them into 6.14 at this late stage of the cycle. * tag 'v6.15-rockchip-dts64-2' of https://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip: arm64: dts: rockchip: remove ethm0_clk0_25m_out from Sige5 gmac0 arm64: dts: rockchip: Fix PWM pinctrl names arm64: dts: rockchip: fix RK3576 SCMI clock IDs dt-bindings: clock: rk3576: add SCMI clocks arm64: dts: rockchip: Fix pcie reset gpio on Orange Pi 5 Max arm64: dts: rockchip: Enable HDMI audio output for ArmSoM Sige7 arm64: dts: rockchip: Enable onboard eMMC on Radxa E20C arm64: dts: rockchip: Add SDHCI controller for RK3528 arm64: dts: rockchip: Remove bluetooth node from rock-3a arm64: dts: rockchip: Move rk356x scmi SHMEM to reserved memory arm64: dts: rockchip: Add AP6275P wireless support to ArmSoM Sige7 arm64: dts: rockchip: Enable HDMI audio outputs for Orange Pi 5 Plus arm64: dts: rockchip: Enable HDMI1 on Orange Pi 5 Plus arm64: dts: rockchip: Enable HDMI audio outputs for Orange Pi 5 Max arm64: dts: rockchip: Enable HDMI0 audio output for Orange Pi 5/5B Link: https://lore.kernel.org/r/23866869.6Emhk5qWAg@phil Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-03-19	Merge tag 'zynqmp-dt-for-6.14' of https://github.com/Xilinx/linux-xlnx into ↵	Arnd Bergmann
	soc/dt arm64: ZynqMP DT changes for 6.15 - Align clock nodes with DT binding - Add the first VN-X Versal NET board - Move constants out of DT bindings * tag 'zynqmp-dt-for-6.14' of https://github.com/Xilinx/linux-xlnx: dt-bindings: xilinx: Deprecate header with firmware constants arm64: zynqmp: Use DT header for firmware constants arm64: versal-net: Add description for b2197-00 revA board dt-bindings: soc: Add new VN-X board description based on Versal NET arm64: zynqmp: add clock-output-names property in clock nodes Link: https://lore.kernel.org/r/CAHTX3d+u1VmxP4vm0peQS-ST7o0BuCpKUPRVCSLMfAAb=eV3Xg@mail.gmail.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-03-19	Merge tag 'for-net-2025-03-14' of ↵	Paolo Abeni
	git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - hci_event: Fix connection regression between LE and non-LE adapters - Fix error code in chan_alloc_skb_cb() * tag 'for-net-2025-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth: Bluetooth: hci_event: Fix connection regression between LE and non-LE adapters Bluetooth: Fix error code in chan_alloc_skb_cb() ==================== Link: https://patch.msgid.link/20250314163847.110069-1-luiz.dentz@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-19	netconsole: allow selection of egress interface via MAC address	Uday Shankar
	Currently, netconsole has two methods of configuration - module parameter and configfs. The former interface allows for netconsole activation earlier during boot (by specifying the module parameter on the kernel command line), so it is preferred for debugging issues which arise before userspace is up/the configfs interface can be used. The module parameter syntax requires specifying the egress interface name. This requirement makes it hard to use for a couple reasons: - The egress interface name can be hard or impossible to predict. For example, installing a new network card in a system can change the interface names assigned by the kernel. - When constructing the module parameter, one may have trouble determining the original (kernel-assigned) name of the interface (which is the name that should be given to netconsole) if some stable interface naming scheme is in effect. A human can usually look at kernel logs to determine the original name, but this is very painful if automation is constructing the parameter. For these reasons, allow selection of the egress interface via MAC address when configuring netconsole using the module parameter. Update the netconsole documentation with an example of the new syntax. Selection of egress interface by MAC address via configfs is far less interesting (since when this interface can be used, one should be able to easily convert between MAC address and interface name), so it is left unimplemented. Signed-off-by: Uday Shankar <ushankar@purestorage.com> Reviewed-by: Breno Leitao <leitao@debian.org> Tested-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250312-netconsole-v6-2-3437933e79b8@purestorage.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-19	net, treewide: define and use MAC_ADDR_STR_LEN	Uday Shankar
	There are a few places in the tree which compute the length of the string representation of a MAC address as 3 * ETH_ALEN - 1. Define a constant for this and use it where relevant. No functionality changes are expected. Signed-off-by: Uday Shankar <ushankar@purestorage.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@verge.net.au> Link: https://patch.msgid.link/20250312-netconsole-v6-1-3437933e79b8@purestorage.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-19	net: reorder dev_addr_sem lock	Stanislav Fomichev
	Lockdep complains about circular lock in 1 -> 2 -> 3 (see below). Change the lock ordering to be: - rtnl_lock - dev_addr_sem - netdev_ops (only for lower devices!) - team_lock (or other per-upper device lock) 1. rtnl_lock -> netdev_ops -> dev_addr_sem rtnl_setlink rtnl_lock do_setlink IFLA_ADDRESS on lower netdev_ops dev_addr_sem 2. rtnl_lock -> team_lock -> netdev_ops rtnl_newlink rtnl_lock do_setlink IFLA_MASTER on lower do_set_master team_add_slave team_lock team_port_add dev_set_mtu netdev_ops 3. rtnl_lock -> dev_addr_sem -> team_lock rtnl_newlink rtnl_lock do_setlink IFLA_ADDRESS on upper dev_addr_sem netif_set_mac_address team_set_mac_address team_lock 4. rtnl_lock -> netdev_ops -> dev_addr_sem rtnl_lock dev_ifsioc dev_set_mac_address_user __tun_chr_ioctl rtnl_lock dev_set_mac_address_user tap_ioctl rtnl_lock dev_set_mac_address_user dev_set_mac_address_user netdev_lock_ops netif_set_mac_address_user dev_addr_sem v2: - move lock reorder to happen after kmalloc (Kuniyuki) Cc: Kohei Enju <enjuk@amazon.com> Fixes: df43d8bf1031 ("net: replace dev_addr_sem with netdev instance lock") Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250312190513.1252045-3-sdf@fomichev.me Tested-by: Lei Yang <leiyang@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-19	Revert "net: replace dev_addr_sem with netdev instance lock"	Stanislav Fomichev
	This reverts commit df43d8bf10316a7c3b1e47e3cc0057a54df4a5b8. Cc: Kohei Enju <enjuk@amazon.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Fixes: df43d8bf1031 ("net: replace dev_addr_sem with netdev instance lock") Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250312190513.1252045-2-sdf@fomichev.me Tested-by: Lei Yang <leiyang@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-19	net: stmmac: allow platforms to use PHY tx clock stop capability	Russell King (Oracle)
	Allow platform glue to instruct stmmac to make use of the PHY transmit clock stop capability when deciding whether to allow the transmit clock from the DWMAC core to be stopped. Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1tsITp-005vG9-Px@rmk-PC.armlinux.org.uk Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-19	io_uring: rename the data cmd cache	Pavel Begunkov
	Pick a more descriptive name for the cmd async data cache. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/20250319061251.21452-2-sidong.yang@furiosa.ai Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-19	bpf: Maintain FIFO property for rqspinlock unlock	Kumar Kartikeya Dwivedi
	Since out-of-order unlocks are unsupported for rqspinlock, and irqsave variants enforce strict FIFO ordering anyway, make the same change for normal non-irqsave variants, such that FIFO ordering is enforced. Two new verifier state fields (active_lock_id, active_lock_ptr) are used to denote the top of the stack, and prev_id and prev_ptr are ascertained whenever popping the topmost entry through an unlock. Take special care to make these fields part of the state comparison in refsafe. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20250316040541.108729-25-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-19	bpf: Implement verifier support for rqspinlock	Kumar Kartikeya Dwivedi
	Introduce verifier-side support for rqspinlock kfuncs. The first step is allowing bpf_res_spin_lock type to be defined in map values and allocated objects, so BTF-side is updated with a new BPF_RES_SPIN_LOCK field to recognize and validate. Any object cannot have both bpf_spin_lock and bpf_res_spin_lock, only one of them (and at most one of them per-object, like before) must be present. The bpf_res_spin_lock can also be used to protect objects that require lock protection for their kfuncs, like BPF rbtree and linked list. The verifier plumbing to simulate success and failure cases when calling the kfuncs is done by pushing a new verifier state to the verifier state stack which will verify the failure case upon calling the kfunc. The path where success is indicated creates all lock reference state and IRQ state (if necessary for irqsave variants). In the case of failure, the state clears the registers r0-r5, sets the return value, and skips kfunc processing, proceeding to the next instruction. When marking the return value for success case, the value is marked as 0, and for the failure case as [-MAX_ERRNO, -1]. Then, in the program, whenever user checks the return value as 'if (ret)' or 'if (ret < 0)' the verifier never traverses such branches for success cases, and would be aware that the lock is not held in such cases. We push the kfunc state in check_kfunc_call whenever rqspinlock kfuncs are invoked. We introduce a kfunc_class state to avoid mixing lock irqrestore kfuncs with IRQ state created by bpf_local_irq_save. With all this infrastructure, these kfuncs become usable in programs while satisfying all safety properties required by the kernel. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20250316040541.108729-24-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-19	bpf: Introduce rqspinlock kfuncs	Kumar Kartikeya Dwivedi
	Introduce four new kfuncs, bpf_res_spin_lock, and bpf_res_spin_unlock, and their irqsave/irqrestore variants, which wrap the rqspinlock APIs. bpf_res_spin_lock returns a conditional result, depending on whether the lock was acquired (NULL is returned when lock acquisition succeeds, non-NULL upon failure). The memory pointed to by the returned pointer upon failure can be dereferenced after the NULL check to obtain the error code. Instead of using the old bpf_spin_lock type, introduce a new type with the same layout, and the same alignment, but a different name to avoid type confusion. Preemption is disabled upon successful lock acquisition, however IRQs are not. Special kfuncs can be introduced later to allow disabling IRQs when taking a spin lock. Resilient locks are safe against AA deadlocks, hence not disabling IRQs currently does not allow violation of kernel safety. __irq_flag annotation is used to accept IRQ flags for the IRQ-variants, with the same semantics as existing bpf_local_irq_{save, restore}. These kfuncs will require additional verifier-side support in subsequent commits, to allow programs to hold multiple locks at the same time. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20250316040541.108729-23-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-19	rqspinlock: Add entry to Makefile, MAINTAINERS	Kumar Kartikeya Dwivedi
	Ensure that the rqspinlock code is only built when the BPF subsystem is compiled in. Depending on queued spinlock support, we may or may not end up building the queued spinlock slowpath, and instead fallback to the test-and-set implementation. Also add entries to MAINTAINERS file. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20250316040541.108729-18-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-19	rqspinlock: Add macros for rqspinlock usage	Kumar Kartikeya Dwivedi
	Introduce helper macros that wrap around the rqspinlock slow path and provide an interface analogous to the raw_spin_lock API. Note that in case of error conditions, preemption and IRQ disabling is automatically unrolled before returning the error back to the caller. Ensure that in absence of CONFIG_QUEUED_SPINLOCKS support, we fallback to the test-and-set implementation. Add some comments describing the subtle memory ordering logic during unlock, and why it's safe. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20250316040541.108729-17-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-19	rqspinlock: Add basic support for CONFIG_PARAVIRT	Kumar Kartikeya Dwivedi
	We ripped out PV and virtualization related bits from rqspinlock in an earlier commit, however, a fair lock performs poorly within a virtual machine when the lock holder is preempted. As such, retain the virt_spin_lock fallback to test and set lock, but with timeout and deadlock detection. We can do this by simply depending on the resilient_tas_spin_lock implementation from the previous patch. We don't integrate support for CONFIG_PARAVIRT_SPINLOCKS yet, as that requires more involved algorithmic changes and introduces more complexity. It can be done when the need arises in the future. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20250316040541.108729-15-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-19	rqspinlock: Add a test-and-set fallback	Kumar Kartikeya Dwivedi
	Include a test-and-set fallback when queued spinlock support is not available. Introduce a rqspinlock type to act as a fallback when qspinlock support is absent. Include ifdef guards to ensure the slow path in this file is only compiled when CONFIG_QUEUED_SPINLOCKS=y. Subsequent patches will add further logic to ensure fallback to the test-and-set implementation when queued spinlock support is unavailable on an architecture. Unlike other waiting loops in rqspinlock code, the one for test-and-set has no theoretical upper bound under contention, therefore we need a longer timeout than usual. Bump it up to a second in this case. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20250316040541.108729-14-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-19	rqspinlock: Add deadlock detection and recovery	Kumar Kartikeya Dwivedi
	While the timeout logic provides guarantees for the waiter's forward progress, the time until a stalling waiter unblocks can still be long. The default timeout of 1/4 sec can be excessively long for some use cases. Additionally, custom timeouts may exacerbate recovery time. Introduce logic to detect common cases of deadlocks and perform quicker recovery. This is done by dividing the time from entry into the locking slow path until the timeout into intervals of 1 ms. Then, after each interval elapses, deadlock detection is performed, while also polling the lock word to ensure we can quickly break out of the detection logic and proceed with lock acquisition. A 'held_locks' table is maintained per-CPU where the entry at the bottom denotes a lock being waited for or already taken. Entries coming before it denote locks that are already held. The current CPU's table can thus be looked at to detect AA deadlocks. The tables from other CPUs can be looked at to discover ABBA situations. Finally, when a matching entry for the lock being taken on the current CPU is found on some other CPU, a deadlock situation is detected. This function can take a long time, therefore the lock word is constantly polled in each loop iteration to ensure we can preempt detection and proceed with lock acquisition, using the is_lock_released check. We set 'spin' member of rqspinlock_timeout struct to 0 to trigger deadlock checks immediately to perform faster recovery. Note: Extending lock word size by 4 bytes to record owner CPU can allow faster detection for ABBA. It is typically the owner which participates in a ABBA situation. However, to keep compatibility with existing lock words in the kernel (struct qspinlock), and given deadlocks are a rare event triggered by bugs, we choose to favor compatibility over faster detection. The release_held_lock_entry function requires an smp_wmb, while the release store on unlock will provide the necessary ordering for us. Add comments to document the subtleties of why this is correct. It is possible for stores to be reordered still, but in the context of the deadlock detection algorithm, a release barrier is sufficient and needn't be stronger for unlock's case. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20250316040541.108729-13-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>