linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2025-04-08	net: rps: remove kfree_rcu_mightsleep() use	Eric Dumazet
	Add an rcu_head to sd_flow_limit and rps_sock_flow_table structs to use the more conventional and predictable k[v]free_rcu(). Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250407163602.170356-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-08	Merge tag 'cgroup-for-6.15-rc1-fixes' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: - A number of cpuset remote partition related fixes and cleanups along with selftest updates. - A change from this merge window made cgroup_rstat_updated_list() called outside cgroup_rstat_lock leading to list corruptions. Fix it by relocating the call inside the lock. * tag 'cgroup-for-6.15-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup/cpuset: Fix race between newly created partition and dying one cgroup: rstat: call cgroup_rstat_updated_list with cgroup_rstat_lock selftest/cgroup: Add a remote partition transition test to test_cpuset_prs.sh selftest/cgroup: Clean up and restructure test_cpuset_prs.sh selftest/cgroup: Update test_cpuset_prs.sh to use \| as effective CPUs and state separator cgroup/cpuset: Remove unneeded goto in sched_partition_write() and rename it cgroup/cpuset: Code cleanup and comment update cgroup/cpuset: Don't allow creation of local partition over a remote one cgroup/cpuset: Remove remote_partition_check() & make update_cpumasks_hier() handle remote partition cgroup/cpuset: Fix error handling in remote_partition_disable() cgroup/cpuset: Fix incorrect isolated_cpus update in update_parent_effective_cpumask()
2025-04-08	perf/x86/intel: Support auto counter reload	Kan Liang
	The relative rates among two or more events are useful for performance analysis, e.g., a high branch miss rate may indicate a performance issue. Usually, the samples with a relative rate that exceeds some threshold are more useful. However, the traditional sampling takes samples of events separately. To get the relative rates among two or more events, a high sample rate is required, which can bring high overhead. Many samples taken in the non-hotspot area are also dropped (useless) in the post-process. The auto counter reload (ACR) feature takes samples when the relative rate of two or more events exceeds some threshold, which provides the fine-grained information at a low cost. To support the feature, two sets of MSRs are introduced. For a given counter IA32_PMC_GPn_CTR/IA32_PMC_FXm_CTR, bit fields in the IA32_PMC_GPn_CFG_B/IA32_PMC_FXm_CFG_B MSR indicate which counter(s) can cause a reload of that counter. The reload value is stored in the IA32_PMC_GPn_CFG_C/IA32_PMC_FXm_CFG_C. The details can be found at Intel SDM (085), Volume 3, 21.9.11 Auto Counter Reload. In the hw_config(), an ACR event is specially configured, because the cause/reloadable counter mask has to be applied to the dyn_constraint. Besides the HW limit, e.g., not support perf metrics, PDist and etc, a SW limit is applied as well. ACR events in a group must be contiguous. It facilitates the later conversion from the event idx to the counter idx. Otherwise, the intel_pmu_acr_late_setup() has to traverse the whole event list again to find the "cause" event. Also, add a new flag PERF_X86_EVENT_ACR to indicate an ACR group, which is set to the group leader. The late setup() is also required for an ACR group. It's to convert the event idx to the counter idx, and saved it in hw.config1. The ACR configuration MSRs are only updated in the enable_event(). The disable_event() doesn't clear the ACR CFG register. Add acr_cfg_b/acr_cfg_c in the struct cpu_hw_events to cache the MSR values. It can avoid a MSR write if the value is not changed. Expose an acr_mask to the sysfs. The perf tool can utilize the new format to configure the relation of events in the group. The bit sequence of the acr_mask follows the events enabled order of the group. Example: Here is the snippet of the mispredict.c. Since the array has a random numbers, jumps are random and often mispredicted. The mispredicted rate depends on the compared value. For the Loop1, ~11% of all branches are mispredicted. For the Loop2, ~21% of all branches are mispredicted. main() { ... for (i = 0; i < N; i++) data[i] = rand() % 256; ... /* Loop 1 / for (k = 0; k < 50; k++) for (i = 0; i < N; i++) if (data[i] >= 64) sum += data[i]; ... ... / Loop 2 */ for (k = 0; k < 50; k++) for (i = 0; i < N; i++) if (data[i] >= 128) sum += data[i]; ... } Usually, a code with a high branch miss rate means a bad performance. To understand the branch miss rate of the codes, the traditional method usually samples both branches and branch-misses events. E.g., perf record -e "{cpu_atom/branch-misses/ppu, cpu_atom/branch-instructions/u}" -c 1000000 -- ./mispredict [ perf record: Woken up 4 times to write data ] [ perf record: Captured and wrote 0.925 MB perf.data (5106 samples) ] The 5106 samples are from both events and spread in both Loops. In the post-process stage, a user can know that the Loop 2 has a 21% branch miss rate. Then they can focus on the samples of branch-misses events for the Loop 2. With this patch, the user can generate the samples only when the branch miss rate > 20%. For example, perf record -e "{cpu_atom/branch-misses,period=200000,acr_mask=0x2/ppu, cpu_atom/branch-instructions,period=1000000,acr_mask=0x3/u}" -- ./mispredict (Two different periods are applied to branch-misses and branch-instructions. The ratio is set to 20%. If the branch-instructions is overflowed first, the branch-miss rate < 20%. No samples should be generated. All counters should be automatically reloaded. If the branch-misses is overflowed first, the branch-miss rate > 20%. A sample triggered by the branch-misses event should be generated. Just the counter of the branch-instructions should be automatically reloaded. The branch-misses event should only be automatically reloaded when the branch-instructions is overflowed. So the "cause" event is the branch-instructions event. The acr_mask is set to 0x2, since the event index in the group of branch-instructions is 1. The branch-instructions event is automatically reloaded no matter which events are overflowed. So the "cause" events are the branch-misses and the branch-instructions event. The acr_mask should be set to 0x3.) [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.098 MB perf.data (2498 samples) ] $perf report Percent │154: movl $0x0,-0x14(%rbp) │ ↓ jmp 1af │ for (i = j; i < N; i++) │15d: mov -0x10(%rbp),%eax │ mov %eax,-0x18(%rbp) │ ↓ jmp 1a2 │ if (data[i] >= 128) │165: mov -0x18(%rbp),%eax │ cltq │ lea 0x0(,%rax,4),%rdx │ mov -0x8(%rbp),%rax │ add %rdx,%rax │ mov (%rax),%eax │ ┌──cmp $0x7f,%eax 100.00 0.00 │ ├──jle 19e │ │sum += data[i]; The 2498 samples are all from the branch-misses events for the Loop 2. The number of samples and overhead is significantly reduced without losing any information. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Link: https://lkml.kernel.org/r/20250327195217.2683619-6-kan.liang@linux.intel.com
2025-04-08	perf: Extend the bit width of the arch-specific flag	Kan Liang
	The auto counter reload feature requires an event flag to indicate an auto counter reload group, which can only be scheduled on specific counters that enumerated in CPUID. However, the hw_perf_event.flags has run out on X86. Two solutions were considered to address the issue. - Currently, 20 bits are reserved for the architecture-specific flags. Only the bit 31 is used for the generic flag. There is still plenty of space left. Reserve 8 more bits for the arch-specific flags. - Add a new X86 specific hw_perf_event.flags1 to support more flags. The former is implemented. Enough room is still left in the global generic flag. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Link: https://lkml.kernel.org/r/20250327195217.2683619-4-kan.liang@linux.intel.com
2025-04-08	perf/x86: Add dynamic constraint	Kan Liang
	More and more features require a dynamic event constraint, e.g., branch counter logging, auto counter reload, Arch PEBS, etc. Add a generic flag, PMU_FL_DYN_CONSTRAINT, to indicate the case. It avoids keeping adding the individual flag in intel_cpuc_prepare(). Add a variable dyn_constraint in the struct hw_perf_event to track the dynamic constraint of the event. Apply it if it's updated. Apply the generic dynamic constraint for branch counter logging. Many features on and after V6 require dynamic constraint. So unconditionally set the flag for V6+. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Link: https://lkml.kernel.org/r/20250327195217.2683619-2-kan.liang@linux.intel.com
2025-04-08	perf: Make perf_pmu_unregister() useable	Peter Zijlstra
	Previously it was only safe to call perf_pmu_unregister() if there were no active events of that pmu around -- which was impossible to guarantee since it races all sorts against perf_init_event(). Rework the whole thing by: - keeping track of all events for a given pmu - 'hiding' the pmu from perf_init_event() - waiting for the appropriate (s)rcu grace periods such that all prior references to the PMU will be completed - detaching all still existing events of that pmu (see first point) and moving them to a new REVOKED state. - actually freeing the pmu data. Where notably the new REVOKED state must inhibit all event actions from reaching code that wants to use event->pmu. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lkml.kernel.org/r/20250307193723.525402029@infradead.org
2025-04-08	perf: Fix hang while freeing sigtrap event	Frederic Weisbecker
	Perf can hang while freeing a sigtrap event if a related deferred signal hadn't managed to be sent before the file got closed: perf_event_overflow() task_work_add(perf_pending_task) fput() task_work_add(____fput()) task_work_run() ____fput() perf_release() perf_event_release_kernel() _free_event() perf_pending_task_sync() task_work_cancel() -> FAILED rcuwait_wait_event() Once task_work_run() is running, the list of pending callbacks is removed from the task_struct and from this point on task_work_cancel() can't remove any pending and not yet started work items, hence the task_work_cancel() failure and the hang on rcuwait_wait_event(). Task work could be changed to remove one work at a time, so a work running on the current task can always cancel a pending one, however the wait / wake design is still subject to inverted dependencies when remote targets are involved, as pictured by Oleg: T1 T2 fd = perf_event_open(pid => T2->pid); fd = perf_event_open(pid => T1->pid); close(fd) close(fd) <IRQ> <IRQ> perf_event_overflow() perf_event_overflow() task_work_add(perf_pending_task) task_work_add(perf_pending_task) </IRQ> </IRQ> fput() fput() task_work_add(____fput()) task_work_add(____fput()) task_work_run() task_work_run() ____fput() ____fput() perf_release() perf_release() perf_event_release_kernel() perf_event_release_kernel() _free_event() _free_event() perf_pending_task_sync() perf_pending_task_sync() rcuwait_wait_event() rcuwait_wait_event() Therefore the only option left is to acquire the event reference count upon queueing the perf task work and release it from the task work, just like it was done before 3a5465418f5f ("perf: Fix event leak upon exec and file release") but without the leaks it fixed. Some adjustments are necessary to make it work: * A child event might dereference its parent upon freeing. Care must be taken to release the parent last. * Some places assuming the event doesn't have any reference held and therefore can be freed right away must instead put the reference and let the reference counting to its job. Reported-by: "Yi Lai" <yi1.lai@linux.intel.com> Closes: https://lore.kernel.org/all/Zx9Losv4YcJowaP%2F@ly-workstation/ Reported-by: syzbot+3c4321e10eea460eb606@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/673adf75.050a0220.87769.0024.GAE@google.com/ Fixes: 3a5465418f5f ("perf: Fix event leak upon exec and file release") Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20250304135446.18905-1-frederic@kernel.org
2025-04-08	drm/tests: helpers: Create kunit helper to destroy a drm_display_mode	Maxime Ripard
	A number of test suites call functions that expect the returned drm_display_mode to be destroyed eventually. However, none of the tests called drm_mode_destroy, which results in a memory leak. Since drm_mode_destroy takes two pointers as argument, we can't use a kunit wrapper. Let's just create a helper every test suite can use. Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://lore.kernel.org/r/20250408-drm-kunit-drm-display-mode-memleak-v1-1-996305a2e75a@kernel.org Signed-off-by: Maxime Ripard <mripard@kernel.org>
2025-04-08	RDMA/rxe: Enable ODP in RDMA FLUSH operation	Daisuke Matsuda
	For persistent memories, add rxe_odp_flush_pmem_iova() so that ODP specific steps are executed. Otherwise, no additional consideration is required. Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Link: https://patch.msgid.link/20250324075649.3313968-2-matsuda-daisuke@fujitsu.com Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-04-08	sctp: detect and prevent references to a freed transport in sendmsg	Ricardo Cañuelo Navarro
	sctp_sendmsg() re-uses associations and transports when possible by doing a lookup based on the socket endpoint and the message destination address, and then sctp_sendmsg_to_asoc() sets the selected transport in all the message chunks to be sent. There's a possible race condition if another thread triggers the removal of that selected transport, for instance, by explicitly unbinding an address with setsockopt(SCTP_SOCKOPT_BINDX_REM), after the chunks have been set up and before the message is sent. This can happen if the send buffer is full, during the period when the sender thread temporarily releases the socket lock in sctp_wait_for_sndbuf(). This causes the access to the transport data in sctp_outq_select_transport(), when the association outqueue is flushed, to result in a use-after-free read. This change avoids this scenario by having sctp_transport_free() signal the freeing of the transport, tagging it as "dead". In order to do this, the patch restores the "dead" bit in struct sctp_transport, which was removed in commit 47faa1e4c50e ("sctp: remove the dead field of sctp_transport"). Then, in the scenario where the sender thread has released the socket lock in sctp_wait_for_sndbuf(), the bit is checked again after re-acquiring the socket lock to detect the deletion. This is done while holding a reference to the transport to prevent it from being freed in the process. If the transport was deleted while the socket lock was relinquished, sctp_sendmsg_to_asoc() will return -EAGAIN to let userspace retry the send. The bug was found by a private syzbot instance (see the error report [1] and the C reproducer that triggers it [2]). Link: https://people.igalia.com/rcn/kernel_logs/20250402__KASAN_slab-use-after-free_Read_in_sctp_outq_select_transport.txt [1] Link: https://people.igalia.com/rcn/kernel_logs/20250402__KASAN_slab-use-after-free_Read_in_sctp_outq_select_transport__repro.c [2] Cc: stable@vger.kernel.org Fixes: df132eff4638 ("sctp: clear the transport of some out_chunk_list chunks in sctp_assoc_rm_peer") Suggested-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Ricardo Cañuelo Navarro <rcn@igalia.com> Acked-by: Xin Long <lucien.xin@gmail.com> Link: https://patch.msgid.link/20250404-kasan_slab-use-after-free_read_in_sctp_outq_select_transport__20250404-v1-1-5ce4a0b78ef2@igalia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-08	ASoC: Intel: avs: 16 channels support	Mark Brown
	Merge series from Cezary Rojewski <cezary.rojewski@intel.com>: Relatively small delta-wise patchset which raises max channels supported from 8 to 16. The existing limitation is software-based, not hardware based. The hardware, as per HDAudio specification, section 1.2.2, (relevant register at SDnFMT, section 3.3.41) supports the configurations for years. The avs-driver becomes the first consumer of that configuration on the Linux kernel side. Set starts off with update to string_helpers so that functionality added with parse_int_array_user() can be utilized in kernel-kernel interactions. Follow up is rasing the cap on HDAudio-library side. The format selection procedure found in the library is good-to-go as is. Everything that follows these two patches is avs-driver specific: - raise channels_max for every DAI-driver template - provide i2s_test module parameter for testing purposes. When combined with I2S loopback card, allows to test 16ch on most Intel hardware post Broadwell era - adjust TDM masks to reflect the 8 -> 16 channels change
2025-04-08	ASoC: Intel: avs: Add support for FCL platform	Mark Brown
	Merge series from Cezary Rojewski <cezary.rojewski@intel.com>: The patchset is fairly straightforward - add support for Automotive platforms based on new DSP architecture, Frisco Lake (FCL), a PantherLake (PTL)-based platform is an example of. The cAVS architecture which all Intel AudioDSP followed for years ends with RaptorLake familty. Like all the major updates, this one received new name too - Audio Context Engine (ACE). While the range of improvements and changes on the firmware/hardware side is large, software survives this evolution without need of any major refactoring. Additional hardware changes brought with LunarLake (LNL, ACE 2.0) call for update in PCM-area. The GPDMAs previously utilized for non-HDAudio transfer types are no longer there, everything is running through HDAudio LINK on the Back-End side now. In terms of code, the mtl.c file, provided with patch 05 'ASoC: Intel: avs: PTL-based platforms support' hosts largest number of new handlers - new IRQ and INT control and DSP-cores management. Combined with lnl.c and ptl.c which layer the architecture changes done over ACE generations, provide support for PTL-based platforms e.g.: FCL. The inheritance in summary: mtl.c <- lnl.c <- ptl.c The functional update to HDAudio library is there to help avs-driver read certain capabilities directly from the hardware. Once the pointer to LINK is obtained, there is no need to call AudioDSP firmware to get the caps.
2025-04-08	Use try_lookup_noperm() instead of d_hash_and_lookup() outside of VFS	NeilBrown
	try_lookup_noperm() and d_hash_and_lookup() are nearly identical. The former does some validation of the name where the latter doesn't. Outside of the VFS that validation is likely valuable, and having only one exported function for this task is certainly a good idea. So make d_hash_and_lookup() local to VFS files and change all other callers to try_lookup_noperm(). Note that the arguments are swapped. Signed-off-by: NeilBrown <neilb@suse.de> Link: https://lore.kernel.org/r/20250319031545.2999807-6-neil@brown.name Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-08	VFS: rename lookup_one_len family to lookup_noperm and remove permission check	NeilBrown
	The lookup_one_len family of functions is (now) only used internally by a filesystem on itself either - in a context where permission checking is irrelevant such as by a virtual filesystem populating itself, or xfs accessing its ORPHANAGE or dquota accessing the quota file; or - in a context where a permission check (MAY_EXEC on the parent) has just been performed such as a network filesystem finding in "silly-rename" file in the same directory. This is also the context after the _parentat() functions where currently lookup_one_qstr_excl() is used. So the permission check is pointless. The name "one_len" is unhelpful in understanding the purpose of these functions and should be changed. Most of the callers pass the len as "strlen()" so using a qstr and QSTR() can simplify the code. This patch renames these functions (include lookup_positive_unlocked() which is part of the family despite the name) to have a name based on "lookup_noperm". They are changed to receive a 'struct qstr' instead of separate name and len. In a few cases the use of QSTR() results in a new call to strlen(). try_lookup_noperm() takes a pointer to a qstr instead of the whole qstr. This is consistent with d_hash_and_lookup() (which is nearly identical) and useful for lookup_noperm_unlocked(). The new lookup_noperm_common() doesn't take a qstr yet. That will be tidied up in a subsequent patch. Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/r/20250319031545.2999807-5-neil@brown.name Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-08	gpio: deprecate the GPIOD_FLAGS_BIT_NONEXCLUSIVE flag	Bartosz Golaszewski
	The non-exclusive GPIO request flag looks like a functional feature but is in fact a workaround for a corner-case that got out of hand. It should be removed so deprecate it officially so that nobody uses it anymore. Acked-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20250401-gpio-todo-remove-nonexclusive-v2-1-7c1380797b0d@linaro.org Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2025-04-08	fs: predict not having to do anything in fdput()	Mateusz Guzik
	This matches the annotation in fdget(). Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/20250406235806.1637000-2-mjguzik@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-08	media: v4l2: Add NV15 and NV20 pixel formats	Jonas Karlman
	Add NV15 and NV20 pixel formats used by the Rockchip Video Decoder for 10-bit buffers. NV15 and NV20 is 10-bit 4:2:0/4:2:2 semi-planar YUV formats similar to NV12 and NV16, using 10-bit components with no padding between each component. Instead, a group of 4 luminance/chrominance samples are stored over 5 bytes in little endian order: YYYY = UVUV = 4 * 10 bits = 40 bits = 5 bytes The '15' and '20' suffix refers to the optimum effective bits per pixel which is achieved when the total number of luminance samples is a multiple of 8 for NV15 and 4 for NV20. Signed-off-by: Jonas Karlman <jonas@kwiboo.se> Reviewed-by: Nicolas Dufresne <nicolas.dufresne@collabora.com> Tested-by: Nicolas Dufresne <nicolas.dufresne@collabora.com> Tested-by: Christopher Obbard <chris.obbard@collabora.com> Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
2025-04-07	drm/xe/bmg: Add one additional PCI ID	Matt Roper
	One additional BMG PCI ID has been added to the spec; make sure our driver recognizes devices with this ID properly. Bspec: 68090 Cc: stable@vger.kernel.org # v6.12+ Reviewed-by: Clint Taylor <Clinton.A.Taylor@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250325224709.4073080-2-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> (cherry picked from commit cca9734ebe55f6af11ce8d57ca1afdc4d158c808) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-04-07	RDMA/mlx5: Fix compilation warning when USER_ACCESS isn't set	Mark Bloch
	The cited commit made fs.c always compile, even when INFINIBAND_USER_ACCESS isn't set. This results in a compilation warning about an unused object when compiling with W=1 and USER_ACCESS is unset. Fix this by defining uverbs_destroy_def_handler() even when USER_ACCESS isn't set. Fixes: 36e0d433672f ("RDMA/mlx5: Compile fs.c regardless of INFINIBAND_USER_ACCESS config") Link: https://patch.msgid.link/r/20250402070944.1022093-1-mbloch@nvidia.com Signed-off-by: Mark Bloch <mbloch@nvidia.com> Tested-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-04-07	net: hold instance lock during NETDEV_CHANGE	Stanislav Fomichev
	Cosmin reports an issue with ipv6_add_dev being called from NETDEV_CHANGE notifier: [ 3455.008776] ? ipv6_add_dev+0x370/0x620 [ 3455.010097] ipv6_find_idev+0x96/0xe0 [ 3455.010725] addrconf_add_dev+0x1e/0xa0 [ 3455.011382] addrconf_init_auto_addrs+0xb0/0x720 [ 3455.013537] addrconf_notify+0x35f/0x8d0 [ 3455.014214] notifier_call_chain+0x38/0xf0 [ 3455.014903] netdev_state_change+0x65/0x90 [ 3455.015586] linkwatch_do_dev+0x5a/0x70 [ 3455.016238] rtnl_getlink+0x241/0x3e0 [ 3455.019046] rtnetlink_rcv_msg+0x177/0x5e0 Similarly, linkwatch might get to ipv6_add_dev without ops lock: [ 3456.656261] ? ipv6_add_dev+0x370/0x620 [ 3456.660039] ipv6_find_idev+0x96/0xe0 [ 3456.660445] addrconf_add_dev+0x1e/0xa0 [ 3456.660861] addrconf_init_auto_addrs+0xb0/0x720 [ 3456.661803] addrconf_notify+0x35f/0x8d0 [ 3456.662236] notifier_call_chain+0x38/0xf0 [ 3456.662676] netdev_state_change+0x65/0x90 [ 3456.663112] linkwatch_do_dev+0x5a/0x70 [ 3456.663529] __linkwatch_run_queue+0xeb/0x200 [ 3456.663990] linkwatch_event+0x21/0x30 [ 3456.664399] process_one_work+0x211/0x610 [ 3456.664828] worker_thread+0x1cc/0x380 [ 3456.665691] kthread+0xf4/0x210 Reclassify NETDEV_CHANGE as a notifier that consistently runs under the instance lock. Link: https://lore.kernel.org/netdev/aac073de8beec3e531c86c101b274d434741c28e.camel@nvidia.com/ Reported-by: Cosmin Ratiu <cratiu@nvidia.com> Tested-by: Cosmin Ratiu <cratiu@nvidia.com> Fixes: ad7c7b2172c3 ("net: hold netdev instance lock during sysfs operations") Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250404161122.3907628-1-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-07	lib/string_helpers: Introduce parse_int_array()	Cezary Rojewski
	Existing parse_inte_array_user() works with __user buffers only. Separate array parsing from __user bits so the functionality can be utilized with kernel buffers too. Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Amadeusz Sławiński <amadeuszx.slawinski@linux.intel.com> Signed-off-by: Cezary Rojewski <cezary.rojewski@intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://patch.msgid.link/20250404090337.3564117-2-cezary.rojewski@intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-04-07	ASoC: Intel: avs: PTL-based platforms support	Cezary Rojewski
	Define handlers specific to ACE platforms, that Frisco Lake (FCL), a PantherLake (PTL)-based platform, is founded upon. Most operations are still inherited from their predecessors with the major difference being AudioDSP cores management - replaced by DSP-domain power management. Software has to ensure the DSP domain is both powered on and its power-gating disabled before it can be utilized for streaming. Reviewed-by: Amadeusz Sławiński <amadeuszx.slawinski@linux.intel.com> Signed-off-by: Cezary Rojewski <cezary.rojewski@intel.com> Acked-by: Liam Girdwood <liam.r.girdwood@linux.intel.com> Link: https://patch.msgid.link/20250407112352.3720779-6-cezary.rojewski@intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-04-07	ASoC: Intel: avs: Read HW capabilities when possible	Cezary Rojewski
	Starting with LunarLake (LNL) and onward, some hardware capabilities are visible to the sound driver directly. At the same time, these may no longer be visible to the AudioDSP firmware. Update resource allocation function to rely on the registers when possible. Reviewed-by: Amadeusz Sławiński <amadeuszx.slawinski@linux.intel.com> Signed-off-by: Cezary Rojewski <cezary.rojewski@intel.com> Acked-by: Liam Girdwood <liam.r.girdwood@linux.intel.com> Link: https://patch.msgid.link/20250407112352.3720779-4-cezary.rojewski@intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-04-07	ALSA: hda: Allow to fetch hlink by ID	Cezary Rojewski
	Starting with LNL platform, Intel HDAudio Links carry IDs specifying non-HDAudio transfer type they help facilitate e.g.: 0xC0 for I2S as defined by AZX_REG_ML_LEPTR_ID_INTEL_SSP. The mechanism accounts for LEPTR register as it is Reserved if LCAP.ALT for given Link equals 0. Reviewed-by: Amadeusz Sławiński <amadeuszx.slawinski@linux.intel.com> Signed-off-by: Cezary Rojewski <cezary.rojewski@intel.com> Acked-by: Liam Girdwood <liam.r.girdwood@linux.intel.com> Link: https://patch.msgid.link/20250407112352.3720779-2-cezary.rojewski@intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-04-07	drm/bridge: add function interface for DisplayPort audio implementation	Dmitry Baryshkov
	It is common for the DisplayPort bridges to implement audio support. In preparation to providing a generic framework for DP audio, add corresponding interface to struct drm_bridge. As suggested by Maxime for now this is mostly c&p of the corresponding HDMI audio API. Reviewed-by: Maxime Ripard <mripard@kernel.org> Link: https://lore.kernel.org/r/20250314-dp-hdmi-audio-v6-2-dbd228fa73d7@oss.qualcomm.com Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
2025-04-07	drm/bridge: split HDMI Audio from DRM_BRIDGE_OP_HDMI	Dmitry Baryshkov
	As pointed out by Laurent, OP bits are supposed to describe operations. Split DRM_BRIDGE_OP_HDMI_AUDIO from DRM_BRIDGE_OP_HDMI instead of overloading DRM_BRIDGE_OP_HDMI. Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Reviewed-by: Maxime Ripard <mripard@kernel.org> Link: https://lore.kernel.org/r/20250314-dp-hdmi-audio-v6-1-dbd228fa73d7@oss.qualcomm.com Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
2025-04-07	Merge drm/drm-next into drm-misc-next	Thomas Zimmermann
	Backmerging to get v6.15-rc1 into drm-misc-next. Also fixes a build issue when enabling CONFIG_DRM_SCHED_KUNIT_TEST. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
2025-04-07	platform/x86: intel_pmc_ipc: add option to build without ACPI	David E. Box
	Introduce a configuration option that allows users to build the intel_pmc_ipc driver without ACPI support. This is useful for systems where ACPI is not available or desired. Based on the discussion from the patch [1], it was necessary to provide this option to accommodate specific use cases. Link: https://patchwork.kernel.org/project/netdevbpf/patch/20250227121522.1802832-6-yong.liang.choong@linux.intel.com/#26280764 [1] Signed-off-by: David E. Box <david.e.box@linux.intel.com> Co-developed-by: Choong Yong Liang <yong.liang.choong@linux.intel.com> Signed-off-by: Choong Yong Liang <yong.liang.choong@linux.intel.com> Reviewed-by: Andy Shevchenko <andy@kernel.org> Link: https://lore.kernel.org/r/20250313085526.1439092-1-yong.liang.choong@linux.intel.com Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-04-07	Merge branch 'kvm-tdx-initial' into HEAD	Paolo Bonzini
	This large commit contains the initial support for TDX in KVM. All x86 parts enable the host-side hypercalls that KVM uses to talk to the TDX module, a software component that runs in a special CPU mode called SEAM (Secure Arbitration Mode). The series is in turn split into multiple sub-series, each with a separate merge commit: - Initialization: basic setup for using the TDX module from KVM, plus ioctls to create TDX VMs and vCPUs. - MMU: in TDX, private and shared halves of the address space are mapped by different EPT roots, and the private half is managed by the TDX module. Using the support that was added to the generic MMU code in 6.14, add support for TDX's secure page tables to the Intel side of KVM. Generic KVM code takes care of maintaining a mirror of the secure page tables so that they can be queried efficiently, and ensuring that changes are applied to both the mirror and the secure EPT. - vCPU enter/exit: implement the callbacks that handle the entry of a TDX vCPU (via the SEAMCALL TDH.VP.ENTER) and the corresponding save/restore of host state. - Userspace exits: introduce support for guest TDVMCALLs that KVM forwards to userspace. These correspond to the usual KVM_EXIT_* "heavyweight vmexits" but are triggered through a different mechanism, similar to VMGEXIT for SEV-ES and SEV-SNP. - Interrupt handling: support for virtual interrupt injection as well as handling VM-Exits that are caused by vectored events. Exclusive to TDX are machine-check SMIs, which the kernel already knows how to handle through the kernel machine check handler (commit 7911f145de5f, "x86/mce: Implement recovery for errors in TDX/SEAM non-root mode") - Loose ends: handling of the remaining exits from the TDX module, including EPT violation/misconfig and several TDVMCALL leaves that are handled in the kernel (CPUID, HLT, RDMSR/WRMSR, GetTdVmCallInfo); plus returning an error or ignoring operations that are not supported by TDX guests Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-04-07	media: uapi: v4l: Fix V4L2_TYPE_IS_OUTPUT condition	Nas Chung
	V4L2_TYPE_IS_OUTPUT() returns true for V4L2_BUF_TYPE_VIDEO_OVERLAY which definitely belongs to CAPTURE. Signed-off-by: Nas Chung <nas.chung@chipsnmedia.com> Signed-off-by: Sebastian Fricke <sebastian.fricke@collabora.com> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
2025-04-07	media: uapi: v4l: Change V4L2_TYPE_IS_CAPTURE condition	Nas Chung
	Explicitly compare a buffer type only with valid buffer types, to avoid matching a buffer type outside of the valid buffer type set. Signed-off-by: Nas Chung <nas.chung@chipsnmedia.com> Reviewed-by: Michael Tretter <m.tretter@pengutronix.de> Signed-off-by: Sebastian Fricke <sebastian.fricke@collabora.com> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
2025-04-07	Merge branch 'kvm-6.15-rc2-fixes' into HEAD	Paolo Bonzini

2025-04-07	drm/bridge: add support for refcounting	Luca Ceresoli
	DRM bridges are currently considered as a fixed element of a DRM card, and thus their lifetime is assumed to extend for as long as the card exists. New use cases, such as hot-pluggable hardware with video bridges, require DRM bridges to be added to and removed from a DRM card without tearing the card down. This is possible for connectors already (used by DP MST), it is now needed for DRM bridges as well. As a first preliminary step, make bridges reference-counted to allow a struct drm_bridge (along with the private driver structure embedding it) to stay allocated even after the driver has been removed, until the last reference is put. Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com> Reviewed-by: Maxime Ripard <mripard@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250326-drm-bridge-refcount-v9-2-5e0661fe1f84@bootlin.com Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>
2025-04-07	drm/bridge: add devm_drm_bridge_alloc()	Luca Ceresoli
	Add a macro to allocate and initialize a DRM bridge embedded within a private driver struct. Compared to current practice, which is based on [devm_]kzalloc() allocation followed by open-coded initialization of fields, this allows to have a common and explicit API to allocate and initialize DRM bridges. Besides being useful to consolidate bridge driver code, this is a fundamental step in preparation for adding dynamic lifetime to bridges based on refcount. Reviewed-by: Maxime Ripard <mripard@kernel.org> Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250326-drm-bridge-refcount-v9-1-5e0661fe1f84@bootlin.com Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>
2025-04-07	drm/sysfb: vesadrm: Add gamma correction	Thomas Zimmermann
	Add palette support and export GAMMA properties via sysfs. User-space compositors can use this interface for programming gamma ramps or night mode. Vesadrm supports palette updates via VGA DAC registers or VESA palette calls. Up to 256 palette entries are available. Userspace always supplies gamma ramps of 256 entries. If the native color format does not match this because pixel component have less then 8 bits, vesadrm interpolates among the palette entries. The code uses CamelCase style in a few places to match the VESA manuals. v3: - fix coding style v2: - use CONFIG_X86_32 instead of __i386__ (checkpatch) - protect struct vesadrm.pmi with CONFIG_X86_32 Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Link: https://lore.kernel.org/r/20250401094056.32904-19-tzimmermann@suse.de
2025-04-07	drm/sysfb: Add efidrm for EFI displays	Thomas Zimmermann
	Add support for screen_info setups with VIDEO_TYPE_EFI. Provide the minimum functionality of reading modes, updating and clearing the display. There is existing support for these displays provided by simpledrm with CONFIG_SYSFB_SIMPLEFB=y. Using efidrm over simpledrm will allows for the mapping of video memory with correct caching. Simpledrm always assumes WC caching, while fully cached memory is possible with efidrm. Efidrm will also allow for the use of additional functionality provided by EFI, such as EDID information. In addition to efidrm, add struct pixel_format plus initializer macros. The type and macros describe pixel formats in a generic way on order to find the DRM format from the screen_info settings. Similar existing code in SIMPLEFB_FORMATS and fbdev is not really what is needed in efidrm, but SIMPLEFB_FORMATS can later be converted to struct pixel_format. v4: - depend on CONFIG_EFI - disallow module for now as efi_mem_desc_lookup() is not exported v3: - depend on !SYSFB_SIMPLEFB (Javier) Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Link: https://lore.kernel.org/r/20250401094056.32904-15-tzimmermann@suse.de
2025-04-07	firmware: sysfb: Move bpp-depth calculation into screen_info helper	Thomas Zimmermann
	Move the calculation of the bits per pixels for screen_info into a helper function. This will make it available to other callers besides the firmware code. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Link: https://lore.kernel.org/r/20250401094056.32904-14-tzimmermann@suse.de
2025-04-07	dt-bindings: reset: Add T-HEAD TH1520 SoC Reset Controller	Michal Wilczynski
	Add a YAML schema for the T-HEAD TH1520 SoC reset controller. This controller manages resets for subsystems such as the GPU within the TH1520 SoC. Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Michal Wilczynski <m.wilczynski@samsung.com> Link: https://lore.kernel.org/r/20250303152511.494405-2-m.wilczynski@samsung.com Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
2025-04-07	genirq/generic-chip: Remove unused lock wrappers	Thomas Gleixner
	All users are converted to lock guards. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250313142524.388478168@linutronix.de
2025-04-07	genirq/generic-chip: Make locking unconditional	Thomas Gleixner
	The SMP conditional wrappers around raw_spin_[un]lock() have no real value. On !SMP kernels the lock operations are NOOPs except for a preempt_disable/enable() pair on PREEMPT enabled kernels, which are not really worth to optimize for. Aside of that this evades lockdep on !SMP kernels. Remove the !SMP stubs and make it unconditional. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250313142524.011345765@linutronix.de
2025-04-07	super: use common iterator (Part 2)	Christian Brauner
	Use a common iterator for all callbacks. We could go for something even more elaborate (advance step-by-step similar to iov_iter) but I really don't think this is warranted. Link: https://lore.kernel.org/r/20250329-work-freeze-v2-5-a47af37ecc3d@kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-07	super: use a common iterator (Part 1)	Christian Brauner
	Use a common iterator for all callbacks. Link: https://lore.kernel.org/r/20250329-work-freeze-v2-4-a47af37ecc3d@kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-07	fs: allow all writers to be frozen	James Bottomley
	During freeze/thaw we need to be able to freeze all writers during suspend/hibernate. Otherwise tasks such as systemd-journald that mmap a file and write to it will not be frozen after we've already frozen the filesystem. This has some risk of not being able to freeze processes in case a process has acquired SB_FREEZE_PAGEFAULT under mmap_sem or SB_FREEZE_INTERNAL under some other filesytem specific lock. If the filesystem is frozen, a task can block on the frozen filesystem with e.g., mmap_sem held. If some other task then blocks on grabbing that mmap_sem, hibernation ill fail because it is unable to hibernate a task holding mmap_sem. This could be fixed by making a range of filesystem related locks use freezable sleeping. That's impractical and not warranted just for suspend/hibernate. Assume that this is an infrequent problem and we've given userspace a way to skip filesystem freezing through a sysfs file. Link: https://lore.kernel.org/r/20250402-work-freeze-v2-2-6719a97b52ac@kernel.org Link: https://lore.kernel.org/r/20250327140613.25178-3-James.Bottomley@HansenPartnership.com [brauner: make all freeze levels set TASK_FREEZABLE and rewrite commit message] Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-07	locking/percpu-rwsem: add freezable alternative to down_read	James Bottomley
	Percpu-rwsems are used for superblock locking. However, we know the read percpu-rwsem we take for sb_start_write() on a frozen filesystem needs not to inhibit system from suspending or hibernating. That means it needs to wait with TASK_UNINTERRUPTIBLE \| TASK_FREEZABLE. Introduce a new percpu_down_read_freezable() that allows us to control whether TASK_FREEZABLE is added to the wait flags. Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Link: https://lore.kernel.org/r/20250327140613.25178-2-James.Bottomley@HansenPartnership.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-07	fs: Remove aops->writepage	Matthew Wilcox (Oracle)
	All callers and implementations are now removed, so remove the operation and update the documentation to match. Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org> Link: https://lore.kernel.org/r/20250402150005.2309458-10-willy@infradead.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-07	shmem: Add shmem_writeout()	Matthew Wilcox (Oracle)
	This will be the replacement for shmem_writepage(). Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org> Link: https://lore.kernel.org/r/20250402150005.2309458-6-willy@infradead.org Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-07	irqdomain: Support three-cell scheme interrupts	Yixun Lan
	Add new function *_twothreecell() to extend support to parse three-cell interrupts which encoded as <instance hwirq irqflag>, the translate function will retrieve irq number and flag from last two cells. This API will be used in gpio irq driver which need to work with two or three cells cases. Signed-off-by: Yixun Lan <dlan@gentoo.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250326-04-gpio-irq-threecell-v3-1-aab006ab0e00@gentoo.org
2025-04-07	VFS: improve interface for lookup_one functions	NeilBrown
	The family of functions: lookup_one() lookup_one_unlocked() lookup_one_positive_unlocked() appear designed to be used by external clients of the filesystem rather than by filesystems acting on themselves as the lookup_one_len family are used. They are used by: btrfs/ioctl - which is a user-space interface rather than an internal activity exportfs - i.e. from nfsd or the open_by_handle_at interface overlayfs - at access the underlying filesystems smb/server - for file service They should be used by nfsd (more than just the exportfs path) and cachefs but aren't. It would help if the documentation didn't claim they should "not be called by generic code". Also the path component name is passed as "name" and "len" which are (confusingly?) separate by the "base". In some cases the len in simply "strlen" and so passing a qstr using QSTR() would make the calling clearer. Other callers do pass separate name and len which are stored in a struct. Sometimes these are already stored in a qstr, other times it easily could be. So this patch changes these three functions to receive a 'struct qstr *', and improves the documentation. QSTR_LEN() is added to make it easy to pass a QSTR containing a known len. [brauner@kernel.org: take a struct qstr pointer] Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/r/20250319031545.2999807-2-neil@brown.name Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-07	irqchip/davinci: Remove leftover header	Bartosz Golaszewski
	Commit fa8dede4d0a0 ("irqchip: remove davinci aintc driver") removed the davinci aintc driver but left behind the associated header. Remove it now. Fixes: fa8dede4d0a0 ("irqchip: remove davinci aintc driver") Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/all/20250306084552.15894-1-brgl@bgdev.pl
2025-04-07	mtd: spinand: Fix build with gcc < 7.5	Miquel Raynal
	__VA_OPT__ is a macro that is useful when some arguments can be present or not to entirely skip some part of a definition. Unfortunately, it is a too recent addition that some of the still supported old GCC versions do not know about, and is anyway not part of C11 that is the version used in the kernel. Find a trick to remove this macro, typically '__VA_ARGS__ + 0' is a workaround used in netlink.h which works very well here, as we either expect: - 0 - A positive value - No value, which means the field should be 0. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202503181330.YcDXGy7F-lkp@intel.com/ Fixes: 7ce0d16d5802 ("mtd: spinand: Add an optional frequency to read from cache macros") Cc: stable@vger.kernel.org Tested-by: Jean Delvare <jdelvare@suse.de> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>