linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2025-01-20	bpf: Allow 'may_goto 0' instruction in verifier	Yonghong Song
	Commit 011832b97b31 ("bpf: Introduce may_goto instruction") added support for may_goto insn. The 'may_goto 0' insn is disallowed since the insn is equivalent to a nop as both branch will go to the next insn. But it is possible that compiler transformation may generate 'may_goto 0' insn. Emil Tsalapatis from Meta reported such a case which caused verification failure. For example, for the following code, int i, tmp[3]; for (i = 0; i < 3 && can_loop; i++) tmp[i] = 0; ... clang 20 may generate code like may_goto 2; may_goto 1; may_goto 0; r1 = 0; /* tmp[0] = 0; / r2 = 0; / tmp[1] = 0; / r3 = 0; / tmp[2] = 0; */ Let us permit 'may_goto 0' insn to avoid verification failure for codes like the above. Reported-by: Emil Tsalapatis <etsal@meta.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20250118192024.2124059-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-01-20	Merge tag 'vfs-6.14-rc1.misc' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "Features: - Support caching symlink lengths in inodes The size is stored in a new union utilizing the same space as i_devices, thus avoiding growing the struct or taking up any more space When utilized it dodges strlen() in vfs_readlink(), giving about 1.5% speed up when issuing readlink on /initrd.img on ext4 - Add RWF_DONTCACHE iocb and FOP_DONTCACHE file_operations flag If a file system supports uncached buffered IO, it may set FOP_DONTCACHE and enable support for RWF_DONTCACHE. If RWF_DONTCACHE is attempted without the file system supporting it, it'll get errored with -EOPNOTSUPP - Enable VBOXGUEST and VBOXSF_FS on ARM64 Now that VirtualBox is able to run as a host on arm64 (e.g. the Apple M3 processors) we can enable VBOXSF_FS (and in turn VBOXGUEST) for this architecture. Tested with various runs of bonnie++ and dbench on an Apple MacBook Pro with the latest Virtualbox 7.1.4 r165100 installed Cleanups: - Delay sysctl_nr_open check in expand_files() - Use kernel-doc includes in fiemap docbook - Use page->private instead of page->index in watch_queue - Use a consume fence in mnt_idmap() as it's heavily used in link_path_walk() - Replace magic number 7 with ARRAY_SIZE() in fc_log - Sort out a stale comment about races between fd alloc and dup2() - Fix return type of do_mount() from long to int - Various cosmetic cleanups for the lockref code Fixes: - Annotate spinning as unlikely() in __read_seqcount_begin The annotation already used to be there, but got lost in commit 52ac39e5db51 ("seqlock: seqcount_t: Implement all read APIs as statement expressions") - Fix proc_handler for sysctl_nr_open - Flush delayed work in delayed fput() - Fix grammar and spelling in propagate_umount() - Fix ESP not readable during coredump In /proc/PID/stat, there is the kstkesp field which is the stack pointer of a thread. While the thread is active, this field reads zero. But during a coredump, it should have a valid value However, at the moment, kstkesp is zero even during coredump - Don't wake up the writer if the pipe is still full - Fix unbalanced user_access_end() in select code" * tag 'vfs-6.14-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (28 commits) gfs2: use lockref_init for qd_lockref erofs: use lockref_init for pcl->lockref dcache: use lockref_init for d_lockref lockref: add a lockref_init helper lockref: drop superfluous externs lockref: use bool for false/true returns lockref: improve the lockref_get_not_zero description lockref: remove lockref_put_not_zero fs: Fix return type of do_mount() from long to int select: Fix unbalanced user_access_end() vbox: Enable VBOXGUEST and VBOXSF_FS on ARM64 pipe_read: don't wake up the writer if the pipe is still full selftests: coredump: Add stackdump test fs/proc: do_task_stat: Fix ESP not readable during coredump fs: add RWF_DONTCACHE iocb and FOP_DONTCACHE file_operations flag fs: sort out a stale comment about races between fd alloc and dup2 fs: Fix grammar and spelling in propagate_umount() fs: fc_log replace magic number 7 with ARRAY_SIZE() fs: use a consume fence in mnt_idmap() file: flush delayed work in delayed fput() ...
2025-01-20	Merge tag 'vfs-6.14-rc1.kcore' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull /proc/kcore updates from Christian Brauner: "The performance of /proc/kcore reads has been showing up as a bottleneck for the drgn debugger. drgn scripts often spend ~25% of their time in the kernel reading from /proc/kcore. A lot of this overhead comes from silly inefficiencies. This pull request contains fixes for the low-hanging fruit. The fixes are all fairly small and straightforward. The result is a 25% improvement in read latency in micro-benchmarks (from ~235 nanoseconds to ~175) and a 15% improvement in execution time for real-world drgn scripts: - Make /proc/kcore entry permanent - Avoid walking the list on every read - Use percpu_rw_semaphore for kclist_lock - Make Omar Sandoval the official maintainer for /proc/kcore" * tag 'vfs-6.14-rc1.kcore' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: MAINTAINERS: add me as /proc/kcore maintainer proc/kcore: use percpu_rw_semaphore for kclist_lock proc/kcore: don't walk list on every read proc/kcore: mark proc entry as permanent
2025-01-20	Merge tag 'vfs-6.14-rc1.netfs' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs netfs updates from Christian Brauner: "This contains read performance improvements and support for monolithic single-blob objects that have to be read/written as such (e.g. AFS directory contents). The implementation of the two parts is interwoven as each makes the other possible. - Read performance improvements The read performance improvements are intended to speed up some loss of performance detected in cifs and to a lesser extend in afs. The problem is that we queue too many work items during the collection of read results: each individual subrequest is collected by its own work item, and then they have to interact with each other when a series of subrequests don't exactly align with the pattern of folios that are being read by the overall request. Whilst the processing of the pages covered by individual subrequests as they complete potentially allows folios to be woken in parallel and with minimum delay, it can shuffle wakeups for sequential reads out of order - and that is the most common I/O pattern. The final assessment and cleanup of an operation is then held up until the last I/O completes - and for a synchronous sequential operation, this means the bouncing around of work items just adds latency. Two changes have been made to make this work: (1) All collection is now done in a single "work item" that works progressively through the subrequests as they complete (and also dispatches retries as necessary). (2) For readahead and AIO, this work item be done on a workqueue and can run in parallel with the ultimate consumer of the data; for synchronous direct or unbuffered reads, the collection is run in the application thread and not offloaded. Functions such as smb2_readv_callback() then just tell netfslib that the subrequest has terminated; netfslib does a minimal bit of processing on the spot - stat counting and tracing mostly - and then queues/wakes up the worker. This simplifies the logic as the collector just walks sequentially through the subrequests as they complete and walks through the folios, if buffered, unlocking them as it goes. It also keeps to a minimum the amount of latency injected into the filesystem's low-level I/O handling The way netfs supports filesystems using the deprecated PG_private_2 flag is changed: folios are flagged and added to a write request as they complete and that takes care of scheduling the writes to the cache. The originating read request can then just unlock the pages whatever happens. - Single-blob object support Single-blob objects are files for which the content of the file must be read from or written to the server in a single operation because reading them in parts may yield inconsistent results. AFS directories are an example of this as there exists the possibility that the contents are generated on the fly and would differ between reads or might change due to third party interference. Such objects will be written to and retrieved from the cache if one is present, though we allow/may need to propose multiple subrequests to do so. The important part is that read from/write to the server is monolithic. Single blob reading is, for the moment, fully synchronous and does result collection in the application thread and, also for the moment, the API is supplied the buffer in the form of a folio_queue chain rather than using the pagecache. - Related afs changes This series makes a number of changes to the kafs filesystem, primarily in the area of directory handling: - AFS's FetchData RPC reply processing is made partially asynchronous which allows the netfs_io_request's outstanding operation counter to be removed as part of reducing the collection to a single work item. - Directory and symlink reading are plumbed through netfslib using the single-blob object API and are now cacheable with fscache. This also allows the afs_read struct to be eliminated and netfs_io_subrequest to be used directly instead. - Directory and symlink content are now stored in a folio_queue buffer rather than in the pagecache. This means we don't require the RCU read lock and xarray iteration to access it, and folios won't randomly disappear under us because the VM wants them back. - The vnode operation lock is changed from a mutex struct to a private lock implementation. The problem is that the lock now needs to be dropped in a separate thread and mutexes don't permit that. - When a new directory or symlink is created, we now initialise it locally and mark it valid rather than downloading it (we know what it's likely to look like). - We now use the in-directory hashtable to reduce the number of entries we need to scan when doing a lookup. The edit routines have to maintain the hash chains. - Cancellation (e.g. by signal) of an async call after the rxrpc_call has been set up is now offloaded to the worker thread as there will be a notification from rxrpc upon completion. This avoids a double cleanup. - A "rolling buffer" implementation is created to abstract out the two separate folio_queue chaining implementations I had (one for read and one for write). - Functions are provided to create/extend a buffer in a folio_queue chain and tear it down again. This is used to handle AFS directories, but could also be used to create bounce buffers for content crypto and transport crypto. - The was_async argument is dropped from netfs_read_subreq_terminated() Instead we wake the read collection work item by either queuing it or waking up the app thread. - We don't need to use BH-excluding locks when communicating between the issuing thread and the collection thread as neither of them now run in BH context. - Also included are a number of new tracepoints; a split of the netfslib write collection code to put retrying into its own file (it gets more complicated with content encryption). - There are also some minor fixes AFS included, including fixing the AFS directory format struct layout, reducing some directory over-invalidation and making afs_mkdir() translate EEXIST to ENOTEMPY (which is not available on all systems the servers support). - Finally, there's a patch to try and detect entry into the folio unlock function with no folio_queue structs in the buffer (which isn't allowed in the cases that can get there). This is a debugging patch, but should be minimal overhead" * tag 'vfs-6.14-rc1.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (31 commits) netfs: Report on NULL folioq in netfs_writeback_unlock_folios() afs: Add a tracepoint for afs_read_receive() afs: Locally initialise the contents of a new symlink on creation afs: Use the contained hashtable to search a directory afs: Make afs_mkdir() locally initialise a new directory's content netfs: Change the read result collector to only use one work item afs: Make {Y,}FS.FetchData an asynchronous operation afs: Fix cleanup of immediately failed async calls afs: Eliminate afs_read afs: Use netfslib for symlinks, allowing them to be cached afs: Use netfslib for directories afs: Make afs_init_request() get a key if not given a file netfs: Add support for caching single monolithic objects such as AFS dirs netfs: Add functions to build/clean a buffer in a folio_queue afs: Add more tracepoints to do with tracking validity cachefiles: Add auxiliary data trace cachefiles: Add some subrequest tracepoints netfs: Remove some extraneous directory invalidations afs: Fix directory format encoding struct afs: Fix EEXIST error returned from afs_rmdir() to be ENOTEMPTY ...
2025-01-20	Merge branch 'free-htab-element-out-of-bucket-lock'	Alexei Starovoitov
	Hou Tao says: ==================== The patch set continues the previous work [1] to move all the freeings of htab elements out of bucket lock. One motivation for the patch set is the locking problem reported by Sebastian [2]: the freeing of bpf_timer under PREEMPT_RT may acquire a spin-lock (namely softirq_expiry_lock). However the freeing procedure for htab element has already held a raw-spin-lock (namely bucket lock), and it will trigger the warning: "BUG: scheduling while atomic" as demonstrated by the selftests patch. Another motivation is to reduce the locked scope of bucket lock. However, the patch set doesn't move all freeing of htab element out of bucket lock, it still keep the free of special fields in pre-allocated hash map under the protect of bucket lock in htab_map_update_elem(). The patch set is structured as follows: * Patch #1 moves the element freeing out of bucket lock for htab_lru_map_delete_node(). However the freeing is still in the locked scope of LRU raw spin lock. * Patch #2~#3 move the element freeing out of bucket lock for __htab_map_lookup_and_delete_elem() * Patch #4 cancels the bpf_timer in two steps to fix the locking problem in htab_map_update_elem() for PREEMPT_PRT. * Patch #5 adds a selftest for the locking problem Please see individual patches for more details. Comments are always welcome. --- v3: * patch #1: update the commit message to state that the freeing of special field is still in the locked scope of LRU raw spin lock * patch #4: cancel the bpf_timer in two steps only for PREEMPT_RT (suggested by Alexei) v2: https://lore.kernel.org/bpf/20250109061901.2620825-1-houtao@huaweicloud.com * cancels the bpf timer in two steps instead of breaking the reuse the refill of per-cpu ->extra_elems into two steps v1: https://lore.kernel.org/bpf/20250107085559.3081563-1-houtao@huaweicloud.com [1]: https://lore.kernel.org/bpf/20241106063542.357743-1-houtao@huaweicloud.com [2]: https://lore.kernel.org/bpf/20241106084527.4gPrMnHt@linutronix.de ==================== Link: https://patch.msgid.link/20250117101816.2101857-1-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-01-20	selftests/bpf: Add test case for the freeing of bpf_timer	Hou Tao
	The main purpose of the test is to demonstrate the lock problem for the free of bpf_timer under PREEMPT_RT. When freeing a bpf_timer which is running on other CPU in bpf_timer_cancel_and_free(), hrtimer_cancel() will try to acquire a spin-lock (namely softirq_expiry_lock), however the freeing procedure has already held a raw-spin-lock. The test first creates two threads: one to start timers and the other to free timers. The start-timers thread will start the timer and then wake up the free-timers thread to free these timers when the starts complete. After freeing, the free-timer thread will wake up the start-timer thread to complete the current iteration. A loop of 10 iterations is used. Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20250117101816.2101857-6-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-01-20	bpf: Cancel the running bpf_timer through kworker for PREEMPT_RT	Hou Tao
	During the update procedure, when overwrite element in a pre-allocated htab, the freeing of old_element is protected by the bucket lock. The reason why the bucket lock is necessary is that the old_element has already been stashed in htab->extra_elems after alloc_htab_elem() returns. If freeing the old_element after the bucket lock is unlocked, the stashed element may be reused by concurrent update procedure and the freeing of old_element will run concurrently with the reuse of the old_element. However, the invocation of check_and_free_fields() may acquire a spin-lock which violates the lockdep rule because its caller has already held a raw-spin-lock (bucket lock). The following warning will be reported when such race happens: BUG: scheduling while atomic: test_progs/676/0x00000003 3 locks held by test_progs/676: #0: ffffffff864b0240 (rcu_read_lock_trace){....}-{0:0}, at: bpf_prog_test_run_syscall+0x2c0/0x830 #1: ffff88810e961188 (&htab->lockdep_key){....}-{2:2}, at: htab_map_update_elem+0x306/0x1500 #2: ffff8881f4eac1b8 (&base->softirq_expiry_lock){....}-{2:2}, at: hrtimer_cancel_wait_running+0xe9/0x1b0 Modules linked in: bpf_testmod(O) Preemption disabled at: [<ffffffff817837a3>] htab_map_update_elem+0x293/0x1500 CPU: 0 UID: 0 PID: 676 Comm: test_progs Tainted: G ... 6.12.0+ #11 Tainted: [W]=WARN, [O]=OOT_MODULE Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)... Call Trace: <TASK> dump_stack_lvl+0x57/0x70 dump_stack+0x10/0x20 __schedule_bug+0x120/0x170 __schedule+0x300c/0x4800 schedule_rtlock+0x37/0x60 rtlock_slowlock_locked+0x6d9/0x54c0 rt_spin_lock+0x168/0x230 hrtimer_cancel_wait_running+0xe9/0x1b0 hrtimer_cancel+0x24/0x30 bpf_timer_delete_work+0x1d/0x40 bpf_timer_cancel_and_free+0x5e/0x80 bpf_obj_free_fields+0x262/0x4a0 check_and_free_fields+0x1d0/0x280 htab_map_update_elem+0x7fc/0x1500 bpf_prog_9f90bc20768e0cb9_overwrite_cb+0x3f/0x43 bpf_prog_ea601c4649694dbd_overwrite_timer+0x5d/0x7e bpf_prog_test_run_syscall+0x322/0x830 __sys_bpf+0x135d/0x3ca0 __x64_sys_bpf+0x75/0xb0 x64_sys_call+0x1b5/0xa10 do_syscall_64+0x3b/0xc0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 ... </TASK> It seems feasible to break the reuse and refill of per-cpu extra_elems into two independent parts: reuse the per-cpu extra_elems with bucket lock being held and refill the old_element as per-cpu extra_elems after the bucket lock is unlocked. However, it will make the concurrent overwrite procedures on the same CPU return unexpected -E2BIG error when the map is full. Therefore, the patch fixes the lock problem by breaking the cancelling of bpf_timer into two steps for PREEMPT_RT: 1) use hrtimer_try_to_cancel() and check its return value 2) if the timer is running, use hrtimer_cancel() through a kworker to cancel it again Considering that the current implementation of hrtimer_cancel() will try to acquire a being held softirq_expiry_lock when the current timer is running, these steps above are reasonable. However, it also has downside. When the timer is running, the cancelling of the timer is delayed when releasing the last map uref. The delay is also fixable (e.g., break the cancelling of bpf timer into two parts: one part in locked scope, another one in unlocked scope), it can be revised later if necessary. It is a bit hard to decide the right fix tag. One reason is that the problem depends on PREEMPT_RT which is enabled in v6.12. Considering the softirq_expiry_lock lock exists since v5.4 and bpf_timer is introduced in v5.15, the bpf_timer commit is used in the fixes tag and an extra depends-on tag is added to state the dependency on PREEMPT_RT. Fixes: b00628b1c7d5 ("bpf: Introduce bpf timers.") Depends-on: v6.12+ with PREEMPT_RT enabled Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Closes: https://lore.kernel.org/bpf/20241106084527.4gPrMnHt@linutronix.de Signed-off-by: Hou Tao <houtao1@huawei.com> Reviewed-by: Toke Høiland-Jørgensen <toke@kernel.org> Link: https://lore.kernel.org/r/20250117101816.2101857-5-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-01-20	bpf: Free element after unlock in __htab_map_lookup_and_delete_elem()	Hou Tao
	The freeing of special fields in map value may acquire a spin-lock (e.g., the freeing of bpf_timer), however, the lookup_and_delete_elem procedure has already held a raw-spin-lock, which violates the lockdep rule. The running context of __htab_map_lookup_and_delete_elem() has already disabled the migration. Therefore, it is OK to invoke free_htab_elem() after unlocking the bucket lock. Fix the potential problem by freeing element after unlocking bucket lock in __htab_map_lookup_and_delete_elem(). Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20250117101816.2101857-4-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-01-20	bpf: Bail out early in __htab_map_lookup_and_delete_elem()	Hou Tao
	Use goto statement to bail out early when the target element is not found, instead of using a large else branch to handle the more likely case. This change doesn't affect functionality and simply make the code cleaner. Signed-off-by: Hou Tao <houtao1@huawei.com> Reviewed-by: Toke Høiland-Jørgensen <toke@kernel.org> Link: https://lore.kernel.org/r/20250117101816.2101857-3-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-01-20	bpf: Free special fields after unlock in htab_lru_map_delete_node()	Hou Tao
	When bpf_timer is used in LRU hash map, calling check_and_free_fields() in htab_lru_map_delete_node() will invoke bpf_timer_cancel_and_free() to free the bpf_timer. If the timer is running on other CPUs, hrtimer_cancel() will invoke hrtimer_cancel_wait_running() to spin on current CPU to wait for the completion of the hrtimer callback. Considering that the deletion has already acquired a raw-spin-lock (bucket lock). To reduce the time holding the bucket lock, move the invocation of check_and_free_fields() out of bucket lock. However, because htab_lru_map_delete_node() is invoked with LRU raw spin lock being held, the freeing of special fields still happens in a locked scope. Signed-off-by: Hou Tao <houtao1@huawei.com> Reviewed-by: Toke Høiland-Jørgensen <toke@kernel.org> Link: https://lore.kernel.org/r/20250117101816.2101857-2-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-01-20	Merge branches 'acpi-battery', 'acpi-fan' and 'acpi-misc'	Rafael J. Wysocki
	Merge ACPI battery and fan drivers updates and miscellaneous ACPI chanages for 6.14: - Update messages printed by the ACPI battery driver to always refer to driver extensions as "hooks" to avoid confusion with similar functionality in the power supply subsystem in the future (Thomas Weißschuh). - Fix .probe() error path cleanup in the ACPI fan driver to avoid memory leaks (Joe Hattori). - Constify 'struct bin_attribute' in some places in the ACPI subsystem and mark it as __ro_after_init in one place to prevent binary blob attributes from being updated (Thomas Weißschuh) - Add empty stubs for several ACPI-related symbols so that they can be used when CONFIG_ACPI is unset and use them for removing unnecessary conditional compilation from the ipu-bridge driver (Ricardo Ribalda). * acpi-battery: ACPI: battery: Rename extensions to hook in messages * acpi-fan: ACPI: fan: cleanup resources in the error path of .probe() * acpi-misc: media: ipu-bridge: Remove unneeded conditional compilations ACPI: bus: implement acpi_device_hid when !ACPI ACPI: bus: implement for_each_acpi_consumer_dev when !ACPI ACPI: header: implement acpi_device_handle when !ACPI ACPI: bus: implement acpi_get_physical_device_location when !ACPI ACPI: bus: implement for_each_acpi_dev_match when !ACPI ACPI: bus: change the prototype for acpi_get_physical_device_location ACPI: sysfs: Constify 'struct bin_attribute' ACPI: BGRT: Constify 'struct bin_attribute' ACPI: BGRT: Mark bin_attribute as __ro_after_init
2025-01-20	x86: use cmov for user address masking	Linus Torvalds
	This was a suggestion by David Laight, and while I was slightly worried that some micro-architecture would predict cmov like a conditional branch, there is little reason to actually believe any core would be that broken. Intel documents that their existing cores treat CMOVcc as a data dependency that will constrain speculation in their "Speculative Execution Side Channel Mitigations" whitepaper: "Other instructions such as CMOVcc, AND, ADC, SBB and SETcc can also be used to prevent bounds check bypass by constraining speculative execution on current family 6 processors (Intel® Core™, Intel® Atom™, Intel® Xeon® and Intel® Xeon Phi™ processors)" and while that leaves the future uarch issues open, that's certainly true of our traditional SBB usage too. Any core that predicts CMOV will be unusable for various crypto algorithms that need data-independent timing stability, so let's just treat CMOV as the safe choice that simplifies the address masking by avoiding an extra instruction and doesn't need a temporary register. Suggested-by: David Laight <David.Laight@aculab.com> Link: https://www.intel.com/content/dam/develop/external/us/en/documents/336996-speculative-execution-side-channel-mitigations.pdf Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-01-20	Merge branches 'acpi-osl', 'acpi-tables', 'acpi-property', 'acpi-prm' and ↵	Rafael J. Wysocki
	'acpi-apei' Merge assorted changes in ACPI library code for 6.14: - Use usleep_range() instead of msleep() in acpi_os_sleep() to reduce excessive delays due to timer inaccuracy, mostly affecting system suspend and resume (Rafael Wysocki). - Use str_enabled_disabled() string helpers in the ACPI tables parsing code to make it easier to follow (Sunil V L). - Update device properties parsing on systems using ACPI so that data firmware nodes resulting from _DSD evaluation are treated as available in firmware nodes walks (Sakari Ailus). - Fix missing guid_t declaration in linux/prmt.h (Robert Richter). - Update the GHES handling code to follow the global panic= instead of overriding it by force-rebooting the system after a fatal hw error has been reported (Borislav Petkov). * acpi-osl: ACPI: OSL: Use usleep_range() in acpi_os_sleep() * acpi-tables: ACPI: tables: Use string choice helpers * acpi-property: ACPI: property: Consider data nodes as being available * acpi-prm: ACPI: PRM: Fix missing guid_t declaration in linux/prmt.h * acpi-apei: APEI: GHES: Have GHES honor the panic= setting
2025-01-20	x86: use proper 'clac' and 'stac' opcode names	Linus Torvalds
	Back when we added SMAP support, all versions of binutils didn't necessarily understand the 'clac' and 'stac' instructions. So we implemented those instructions manually as ".byte" sequences. But we've since upgraded the minimum version of binutils to version 2.25, and that included proper support for the SMAP instructions, and there's no reason for us to use some line noise to express them any more. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-01-20	iommufd/fault: Use a separate spinlock to protect fault->deliver list	Nicolin Chen
	The fault->mutex serializes the fault read()/write() fops and the iommufd_fault_auto_response_faults(), mainly for fault->response. Also, it was conveniently used to fence the fault->deliver in poll() fop and iommufd_fault_iopf_handler(). However, copy_from/to_user() may sleep if pagefaults are enabled. Thus, they could take a long time to wait for user pages to swap in, blocking iommufd_fault_iopf_handler() and its caller that is typically a shared IRQ handler of an IOMMU driver, resulting in a potential global DOS. Instead of reusing the mutex to protect the fault->deliver list, add a separate spinlock, nested under the mutex, to do the job. iommufd_fault_iopf_handler() would no longer be blocked by copy_from/to_user(). Add a free_list in iommufd_auto_response_faults(), so the spinlock can simply fence a fast list_for_each_entry_safe routine. Provide two deliver list helpers for iommufd_fault_fops_read() to use: - Fetch the first iopf_group out of the fault->deliver list - Restore an iopf_group back to the head of the fault->deliver list Lastly, move the mutex closer to the response in the fault structure, and update its kdoc accordingly. Fixes: 07838f7fd529 ("iommufd: Add iommufd fault object") Link: https://patch.msgid.link/r/20250117192901.79491-1-nicolinc@nvidia.com Cc: stable@vger.kernel.org Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-01-20	riscv: export __cpuid_to_hartid_map	Valentina Fernandez
	EXPORT_SYMBOL_GPL() is missing for __cpuid_to_hartid_map array. Export this symbol to allow drivers compiled as modules to use cpuid_to_hartid_map(). Signed-off-by: Valentina Fernandez <valentina.fernandezalanis@microchip.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
2025-01-20	riscv: sbi: vendorid_list: Add Microchip Technology to the vendor list	Valentina Fernandez
	Add Microchip Technology to the RISC-V vendor list. Signed-off-by: Valentina Fernandez <valentina.fernandezalanis@microchip.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
2025-01-20	cpuidle: teo: Skip sleep length computation for low latency constraints	Rafael J. Wysocki
	If the idle state exit latency constraint is sufficiently low, it is better to avoid the additional latency related to calling tick_nohz_get_sleep_length(). It is also not necessary to compute the sleep length in that case because shallow idle state selection will be forced then regardless of the recent wakeup history. Accordingly, skip the sleep length computation and subsequent checks of the exit latency constraint is low enough. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Christian Loehle <christian.loehle@arm.com> Link: https://patch.msgid.link/6122398.lOV4Wx5bFT@rjwysocki.net
2025-01-20	cpuidle: teo: Replace time_span_ns with a flag	Rafael J. Wysocki
	After recent updates, the time_span_ns field in struct teo_cpu has become an indicator on whether or not the most recent wakeup has been "genuine" which may as well be indicated by a bool field without calling local_clock(), so update the code accordingly. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Christian Loehle <christian.loehle@arm.com> Tested-by: Aboorva Devarajan <aboorvad@linux.ibm.com> Tested-by: Christian Loehle <christian.loehle@arm.com> Link: https://patch.msgid.link/6010475.MhkbZ0Pkbq@rjwysocki.net
2025-01-20	cpuidle: teo: Simplify handling of total events count	Rafael J. Wysocki
	Instead of computing the total events count from scratch every time, decay it and add a PULSE value to it in teo_update(). No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Christian Loehle <christian.loehle@arm.com> Tested-by: Aboorva Devarajan <aboorvad@linux.ibm.com> Tested-by: Christian Loehle <christian.loehle@arm.com> Link: https://patch.msgid.link/9388883.CDJkKcVGEf@rjwysocki.net
2025-01-20	cpuidle: teo: Skip getting the sleep length if wakeups are very frequent	Rafael J. Wysocki
	Commit 6da8f9ba5a87 ("cpuidle: teo: Skip tick_nohz_get_sleep_length() call in some cases") attempted to reduce the governor overhead in some cases by making it avoid obtaining the sleep length (the time till the next timer event) which may be costly. Among other things, after the above commit, tick_nohz_get_sleep_length() was not called any more when idle state 0 was to be returned, which turned out to be problematic and the previous behavior in that respect was restored by commit 4b20b07ce72f ("cpuidle: teo: Don't count non- existent intercepts"). However, commit 6da8f9ba5a87 also caused the governor to avoid calling tick_nohz_get_sleep_length() on systems where idle state 0 is a "polling" one (that is, it is not really an idle state, but a loop continuously executed by the CPU) when the target residency of the idle state to be returned was low enough, so there was no practical need to refine the idle state selection in any way. This change was not removed by the other commit, so now on systems where idle state 0 is a "polling" one, tick_nohz_get_sleep_length() is called when idle state 0 is to be returned, but it is not called when a deeper idle state with sufficiently low target residency is to be returned. That is arguably confusing and inconsistent. Moreover, there is no specific reason why the behavior in question should depend on whether or not idle state 0 is a "polling" one. One way to address this would be to make the governor always call tick_nohz_get_sleep_length() to obtain the sleep length, but that would effectively mean reverting commit 6da8f9ba5a87 and restoring the latency issue that was the reason for doing it. This approach is thus not particularly attractive. To address it differently, notice that if a CPU is woken up very often, this is not likely to be caused by timers in the first place (user space has a default timer slack of 50 us and there are relatively few timers with a deadline shorter than several microseconds in the kernel) and even if it were the case, the potential benefit from using a deep idle state would then be questionable for latency reasons. Therefore, if the majority of CPU wakeups occur within several microseconds, it can be assumed that all wakeups in that range are non-timer and the sleep length need not be determined. Accordingly, introduce a new metric for counting wakeups with the measured idle duration below RESIDENCY_THRESHOLD_NS and modify the idle state selection to skip the tick_nohz_get_sleep_length() invocation if idle state 0 has been selected or the target residency of the candidate idle state is below RESIDENCY_THRESHOLD_NS and the value of the new metric is at least 1/2 of the total event count. Since the above requires the measured idle duration to be determined every time, except for the cases when one of the safety nets has triggered in which the wakeup is counted as a hit in the deepest idle state idle residency range, update the handling of those cases to avoid skipping the idle duration computation when the CPU wakeup is "genuine". Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/3851791.kQq0lBPeGt@rjwysocki.net Tested-by: Aboorva Devarajan <aboorvad@linux.ibm.com> Tested-by: Christian Loehle <christian.loehle@arm.com> Reviewed-by: Christian Loehle <christian.loehle@arm.com> [ rjw: Renamed a struct field ] [ rjw: Fixed typo in the subject and one in a comment ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-01-20	cpuidle: teo: Simplify counting events used for tick management	Rafael J. Wysocki
	Replace the tick_hits metric with a new tick_intercepts one that can be used directly when deciding whether or not to stop the scheduler tick and update the governor functional description accordingly. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Christian Loehle <christian.loehle@arm.com> Tested-by: Aboorva Devarajan <aboorvad@linux.ibm.com> Tested-by: Christian Loehle <christian.loehle@arm.com> Link: https://patch.msgid.link/1987985.PYKUYFuaPT@rjwysocki.net
2025-01-20	cpuidle: teo: Clarify two code comments	Rafael J. Wysocki
	Rewrite two code comments suposed to explain its behavior that are too concise or not sufficiently clear. No functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Christian Loehle <christian.loehle@arm.com> Tested-by: Aboorva Devarajan <aboorvad@linux.ibm.com> Tested-by: Christian Loehle <christian.loehle@arm.com> Link: https://patch.msgid.link/8472971.T7Z3S40VBb@rjwysocki.net [ rjw: Fixed 2 typos in new comments ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-01-20	cpuidle: teo: Drop local variable prev_intercept_idx	Rafael J. Wysocki
	Local variable prev_intercept_idx in teo_select() is redundant because it cannot be 0 when candidate state index is 0. The prev_intercept_idx value is the index of the deepest enabled idle state, so if it is 0, state 0 is the deepest enabled idle state, in which case it must be the only enabled idle state, but then teo_select() would have returned early before initializing prev_intercept_idx. Thus prev_intercept_idx must be nonzero and the check of it against 0 always passes, so it can be dropped altogether. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Christian Loehle <christian.loehle@arm.com> Tested-by: Aboorva Devarajan <aboorvad@linux.ibm.com> Tested-by: Christian Loehle <christian.loehle@arm.com> Link: https://patch.msgid.link/3327997.aeNJFYEL58@rjwysocki.net [ rjw: Fixed typo in the changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-01-20	cpuidle: teo: Combine candidate state index checks against 0	Rafael J. Wysocki
	There are two candidate state index checks against 0 in teo_select() that need not be separate, so combine them and update comments around them. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Christian Loehle <christian.loehle@arm.com> Tested-by: Aboorva Devarajan <aboorvad@linux.ibm.com> Tested-by: Christian Loehle <christian.loehle@arm.com> Link: https://patch.msgid.link/13676346.uLZWGnKmhe@rjwysocki.net
2025-01-20	cpuidle: teo: Reorder candidate state index checks	Rafael J. Wysocki
	Since constraint_idx may be 0, the candidate state index may change to 0 after assigning constraint_idx to it, so first check if it is greater than constraint_idx (and update it if so) and then check it against 0. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Christian Loehle <christian.loehle@arm.com> Tested-by: Aboorva Devarajan <aboorvad@linux.ibm.com> Tested-by: Christian Loehle <christian.loehle@arm.com> Link: https://patch.msgid.link/1907276.tdWV9SEqCh@rjwysocki.net
2025-01-20	cpuidle: teo: Rearrange idle state lookup code	Rafael J. Wysocki
	Rearrange code in the idle state lookup loop in teo_select() to make it somewhat easier to follow and update comments around it. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Christian Loehle <christian.loehle@arm.com> Tested-by: Aboorva Devarajan <aboorvad@linux.ibm.com> Tested-by: Christian Loehle <christian.loehle@arm.com> Link: https://patch.msgid.link/4619938.LvFx2qVVIh@rjwysocki.net
2025-01-20	Merge tag 'asoc-v6.14' of ↵	Takashi Iwai
	https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus ASoC: Updates for v6.14 This was quite a quiet release for what I imagine are holiday related reasons, the diffstat is dominated by some Cirrus Logic Kunit tests. There's the usual mix of small improvements and fixes, plus a few new drivers and features. The diffstat includes some DRM changes due to work on HDMI audio. - Allow clocking on each DAI in an audio graph card to be configured separately. - Improved power management for Renesas RZ-SSI. - KUnit testing for the Cirrus DSP framework. - Memory to meory operation support for Freescale/NXP platforms. - Support for pause operations in SOF. - Support for Allwinner suinv F1C100s, Awinc AW88083, Realtek ALC5682I-VE
2025-01-20	ASoC: codecs: ES8326: Improved PSRR	Zhang Yi
	Modified configuration to improve PSSR when ES8326 is working Signed-off-by: Zhang Yi <zhangyi@everest-semi.com> Link: https://patch.msgid.link/20250120101758.13347-1-zhangyi@everest-semi.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-01-20	ASoC: fsl_asrc_m2m: return error value in asrc_m2m_device_run()	Shengjiu Wang
	The asrc_m2m_device_run() function is the main process function of converting, the error need to be returned to user, that user can handle error case properly. Fixes: 24a01710f627 ("ASoC: fsl_asrc_m2m: Add memory to memory function") Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com> Reviewed-by: Daniel Baluta <daniel.baluta@nxp.com> Link: https://patch.msgid.link/20250120081938.2501554-3-shengjiu.wang@nxp.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-01-20	ASoC: fsl_asrc_m2m: only handle pairs for m2m in the suspend	Shengjiu Wang
	ASRC memory to memory cases and memory to peripheral cases are sharing the same pair pools, the pairs got for m2m suspend function may be used for memory to peripheral, which is handled memory to peripheral driver and can't be handled in memory to memory suspend function. Use the "pair->dma_buffer" as a flag for memory to memory case, when it is allocated, handle the suspend operation for the related pairs. Fixes: 24a01710f627 ("ASoC: fsl_asrc_m2m: Add memory to memory function") Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com> Reviewed-by: Daniel Baluta <daniel.baluta@nxp.com> Link: https://patch.msgid.link/20250120081938.2501554-2-shengjiu.wang@nxp.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-01-20	ALSA: hda: tas2781-spi: Fix -Wsometimes-uninitialized in ↵	Nathan Chancellor
	tasdevice_spi_switch_book() Clang warns (or errors with CONFIG_WERROR=y): sound/pci/hda/tas2781_hda_spi.c:110:6: error: variable 'ret' is used uninitialized whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized] 110 \| if (tas_priv->cur_book != TASDEVICE_BOOK_ID(reg)) { \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ sound/pci/hda/tas2781_hda_spi.c:119:9: note: uninitialized use occurs here 119 \| return ret; \| ^~~ sound/pci/hda/tas2781_hda_spi.c:110:2: note: remove the 'if' if its condition is always true 110 \| if (tas_priv->cur_book != TASDEVICE_BOOK_ID(reg)) { \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ sound/pci/hda/tas2781_hda_spi.c:108:9: note: initialize the variable 'ret' to silence this warning 108 \| int ret; \| ^ \| = 0 Sink the declaration of ret into the if block and just return 0 at the end of the function, as there is nothing to do if cur_book has already been changed. Fixes: bb5f86ea50ff ("ALSA: hda/tas2781: Add tas2781 hda SPI driver") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202501192006.Hm9GmKiV-lkp@intel.com/ Signed-off-by: Nathan Chancellor <nathan@kernel.org> Link: https://patch.msgid.link/20250120-tas2781_hda_spi-fix-wsometimes-uninitialized-v1-1-d7fd104aa63e@kernel.org Signed-off-by: Takashi Iwai <tiwai@suse.de>
2025-01-20	Merge branch 'for-6.14/selftests-trivial' into for-linus	Petr Mladek

2025-01-20	Merge branch 'for-6.14-cpu_sync-fixup' into for-linus	Petr Mladek

2025-01-20	net: phy: realtek: HWMON support for standalone versions of RTL8221B and RTL8251	Aleksander Jan Bajkowski
	HWMON support has been added for the RTL8221/8251 PHYs integrated together with the MAC inside the RTL8125/8126 chips. This patch extends temperature reading support for standalone variants of the mentioned PHYs. I don't know whether the earlier revisions of the RTL8226 also have a built-in temperature sensor, so they have been skipped for now. Tested on RTL8221B-VB-CG. Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-01-20	tcp_cubic: fix incorrect HyStart round start detection	Mahdi Arghavani
	I noticed that HyStart incorrectly marks the start of rounds, leading to inaccurate measurements of ACK train lengths and resetting the `ca->sample_cnt` variable. This inaccuracy can impact HyStart's functionality in terminating exponential cwnd growth during Slow-Start, potentially degrading TCP performance. The issue arises because the changes introduced in commit 4e1fddc98d25 ("tcp_cubic: fix spurious Hystart ACK train detections for not-cwnd-limited flows") moved the caller of the `bictcp_hystart_reset` function inside the `hystart_update` function. This modification added an additional condition for triggering the caller, requiring that (tcp_snd_cwnd(tp) >= hystart_low_window) must also be satisfied before invoking `bictcp_hystart_reset`. This fix ensures that `bictcp_hystart_reset` is correctly called at the start of a new round, regardless of the congestion window size. This is achieved by moving the condition (tcp_snd_cwnd(tp) >= hystart_low_window) from before calling `bictcp_hystart_reset` to after it. I tested with a client and a server connected through two Linux software routers. In this setup, the minimum RTT was 150 ms, the bottleneck bandwidth was 50 Mbps, and the bottleneck buffer size was 1 BDP, calculated as (50M / 1514 / 8) * 0.150 = 619 packets. I conducted the test twice, transferring data from the server to the client for 1.5 seconds. Before the patch was applied, HYSTART-DELAY stopped the exponential growth of cwnd when cwnd = 516, and the bottleneck link was not yet saturated (516 < 619). After the patch was applied, HYSTART-ACK-TRAIN stopped the exponential growth of cwnd when cwnd = 632, and the bottleneck link was saturated (632 > 619). In this test, applying the patch resulted in 300 KB more data delivered. Fixes: 4e1fddc98d25 ("tcp_cubic: fix spurious Hystart ACK train detections for not-cwnd-limited flows") Signed-off-by: Mahdi Arghavani <ma.arghavani@yahoo.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Haibo Zhang <haibo.zhang@otago.ac.nz> Cc: David Eyers <david.eyers@otago.ac.nz> Cc: Abbas Arghavani <abbas.arghavani@mdu.se> Reviewed-by: Neal Cardwell <ncardwell@google.com> Tested-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-01-20	net: macsec: Add endianness annotations in salt struct	Ales Nezbeda
	This change resolves warning produced by sparse tool as currently there is a mismatch between normal generic type in salt and endian annotated type in macsec driver code. Endian annotated types should be used here. Sparse output: warning: restricted ssci_t degrades to integer warning: incorrect type in assignment (different base types) expected restricted ssci_t [usertype] ssci got unsigned int warning: restricted __be64 degrades to integer warning: incorrect type in assignment (different base types) expected restricted __be64 [usertype] pn got unsigned long long Signed-off-by: Ales Nezbeda <anezbeda@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-01-20	tipc: re-order conditions in tipc_crypto_key_rcv()	Dan Carpenter
	On a 32bit system the "keylen + sizeof(struct tipc_aead_key)" math could have an integer wrapping issue. It doesn't matter because the "keylen" is checked on the next line, but just to make life easier for static analysis tools, let's re-order these conditions and avoid the integer overflow. Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-01-20	net: phylink: always do a major config when attaching a SFP PHY	Russell King (Oracle)
	Background: https://lore.kernel.org/r/20250107123615.161095-1-ericwouds@gmail.com Since adding negotiation of in-band capabilities, it is no longer sufficient to just look at the MLO_AN_xxx mode and PHY interface to decide whether to do a major configuration, since the result now depends on the capabilities of the attaching PHY. Always trigger a major configuration in this case. Testing log: https://lore.kernel.org/r/f20c9744-3953-40e7-a9c9-5534b25d2e2a@gmail.com Reported-by: Eric Woudstra <ericwouds@gmail.com> Tested-by: Eric Woudstra <ericwouds@gmail.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-01-20	platform/x86: acer-wmi: Fix initialization of last_non_turbo_profile	Armin Wolf
	On machines that do not support the balanced profile the value of last_non_turbo_profile is invalid after initialization which might cause the driver to switch to an unsupported platform profile later. Fix this by only setting last_non_turbo_profile to supported platform profile values. Fixes: 191e21f1a4c3 ("platform/x86: acer-wmi: use an ACPI bitmap to set the platform profile choices") Signed-off-by: Armin Wolf <W_Armin@gmx.de> Tested-by: Hridesh MG <hridesh699@gmail.com> Reviewed-by: Hridesh MG <hridesh699@gmail.com> Link: https://lore.kernel.org/r/20250119201723.11102-3-W_Armin@gmx.de Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-01-20	platform/x86: acer-wmi: Ignore AC events	Armin Wolf
	On the Acer Swift SFG14-41, the events 8 - 1 and 8 - 0 are printed on AC connect/disconnect. Ignore those events to avoid spamming the kernel log with error messages. Reported-by: Farhan Anwar <farhan.anwar8@gmail.com> Closes: https://lore.kernel.org/platform-driver-x86/2ffb529d-e7c8-4026-a3b8-120c8e7afec8@gmail.com Tested-by: Rayan Margham <rayanmargham4@gmail.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Armin Wolf <W_Armin@gmx.de> Link: https://lore.kernel.org/r/20250119201723.11102-2-W_Armin@gmx.de Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-01-20	Merge branch 'kvm-mirror-page-tables' into HEAD	Paolo Bonzini
	As part of enabling TDX virtual machines, support support separation of private/shared EPT into separate roots. Confidential computing solutions almost invariably have concepts of private and shared memory, but they may different a lot in the details. In SEV, for example, the bit is handled more like a permission bit as far as the page tables are concerned: the private/shared bit is not included in the physical address. For TDX, instead, the bit is more like a physical address bit, with the host mapping private memory in one half of the address space and shared in another. Furthermore, the two halves are mapped by different EPT roots and only the shared half is managed by KVM; the private half (also called Secure EPT in Intel documentation) gets managed by the privileged TDX Module via SEAMCALLs. As a result, the operations that actually change the private half of the EPT are limited and relatively slow compared to reading a PTE. For this reason the design for KVM is to keep a mirror of the private EPT in host memory. This allows KVM to quickly walk the EPT and only perform the slower private EPT operations when it needs to actually modify mid-level private PTEs. There are thus three sets of EPT page tables: external, mirror and direct. In the case of TDX (the only user of this framework) the first two cover private memory, whereas the third manages shared memory: external EPT - Hidden within the TDX module, modified via TDX module calls. mirror EPT - Bookkeeping tree used as an optimization by KVM, not used by the processor. direct EPT - Normal EPT that maps unencrypted shared memory. Managed like the EPT of a normal VM. Modifying external EPT ---------------------- Modifications to the mirrored page tables need to also perform the same operations to the private page tables, which will be handled via kvm_x86_ops. Although this prep series does not interact with the TDX module at all to actually configure the private EPT, it does lay the ground work for doing this. In some ways updating the private EPT is as simple as plumbing PTE modifications through to also call into the TDX module; however, the locking is more complicated because inserting a single PTE cannot anymore be done atomically with a single CMPXCHG. For this reason, the existing FROZEN_SPTE mechanism is used whenever a call to the TDX module updates the private EPT. FROZEN_SPTE acts basically as a spinlock on a PTE. Besides protecting operation of KVM, it limits the set of cases in which the TDX module will encounter contention on its own PTE locks. Zapping external EPT -------------------- While the framework tries to be relatively generic, and to be understandable without knowing TDX much in detail, some requirements of TDX sometimes leak; for example the private page tables also cannot be zapped while the range has anything mapped, so the mirrored/private page tables need to be protected from KVM operations that zap any non-leaf PTEs, for example kvm_mmu_reset_context() or kvm_mmu_zap_all_fast(). For normal VMs, guest memory is zapped for several reasons: user memory getting paged out by the guest, memslots getting deleted, passthrough of devices with non-coherent DMA. Confidential computing adds to these the conversion of memory between shared and privates. These operations must not zap any private memory that is in use by the guest. This is possible because the only zapping that is out of the control of KVM/userspace is paging out userspace memory, which cannot apply to guestmemfd operations. Thus a TDX VM will only zap private memory from memslot deletion and from conversion between private and shared memory which is triggered by the guest. To avoid zapping too much memory, enums are introduced so that operations can choose to target only private or shared memory, and thus only direct or mirror EPT. For example: Memslot deletion - Private and shared MMU notifier based zapping - Shared only Conversion to shared - Private only Conversion to private - Shared only Other cases of zapping will not be supported for KVM, for example APICv update or non-coherent DMA status update; for the latter, TDX will simply require that the CPU supports self-snoop and honor guest PAT unconditionally for shared memory.
2025-01-20	samples/vfs: fix build warnings	Christian Brauner
	Fix build warnings reported from linux-next. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Link: https://lore.kernel.org/r/20250120192504.4a1965a0@canb.auug.org.au Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-01-20	platform/mellanox: mlxreg-io: use sysfs_emit() instead of sprintf()	Ai Chao
	Follow the advice in Documentation/filesystems/sysfs.rst: show() should only use sysfs_emit() or sysfs_emit_at() when formatting the value to be returned to user space. Signed-off-by: Ai Chao <aichao@kylinos.cn> Acked-by: Vadim Pasternak <vadimp@nvidia.com> Link: https://lore.kernel.org/r/20250116081129.2902274-1-aichao@kylinos.cn Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-01-20	platform/mellanox: mlxreg-hotplug: use sysfs_emit() instead of sprintf()	Ai Chao
	Follow the advice in Documentation/filesystems/sysfs.rst: show() should only use sysfs_emit() or sysfs_emit_at() when formatting the value to be returned to user space. Signed-off-by: Ai Chao <aichao@kylinos.cn> Acked-by: Vadim Pasternak <vadimp@nvidia.com> Link: https://lore.kernel.org/r/20250116081000.2900435-1-aichao@kylinos.cn Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-01-20	platform/mellanox: mlxbf-bootctl: use sysfs_emit() instead of sprintf()	Ai Chao
	Follow the advice in Documentation/filesystems/sysfs.rst: show() should only use sysfs_emit() or sysfs_emit_at() when formatting the value to be returned to user space. Signed-off-by: Ai Chao <aichao@kylinos.cn> Acked-by: Vadim Pasternak <vadimp@nvidia.com> Link: https://lore.kernel.org/r/20250116080836.2890442-1-aichao@kylinos.cn Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-01-20	platform/x86: hp-wmi: Add fan and thermal profile support for Victus 16-s1000	Julien ROBIN
	The following patch adds support for HP Victus 16-s1000 laptop series, by adding and fixing the following functionalities, which can be accessed through hwmon and platform_profile sysfs: - Functional measured fan speed reading - Ability to enable and disable maximum fan speed - Platform profiles full setting ability for CPU and GPU It sets appropriates CPU and GPU power settings both on AC and battery power sources, for low-power, balanced and performance modes. It has been thoroughly tested on a 16-s1034nf laptop based on a 8C9C DMI board name, and behavior of the driver on previous boards is left untouched thanks to the separated lists of DMI board names. Signed-off-by: Julien ROBIN <julien.robin28@free.fr> Link: https://lore.kernel.org/r/1c00f906-8500-41d5-be80-f9092b6a49f1@free.fr Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-01-20	Merge branch 'thermal-intel'	Rafael J. Wysocki
	Merge updates of Intel thermal drivers for 6.14: - Add support for Panther Lake processors in multiple places (Zhang Rui, Srinivas Pandruvada). - Remove explicit user_space governor selection from Intel thermal drivers (Srinivas Pandruvada). * thermal-intel: thermal: intel: Fix compile issue when CONFIG_NET is not defined thermal: intel: int340x: Panther Lake power floor and workload hint support thermal: intel: int340x: Panther Lake DLVR support thermal: intel: Remove explicit user_space governor selection ACPI: DPTF: Support Panther Lake thermal: intel: int340x: processor: Enable MMIO RAPL for Panther Lake powercap: intel_rapl: Add support for Panther Lake platform
2025-01-20	Merge branch 'kvm-userspace-hypercall' into HEAD	Paolo Bonzini
	Make the completion of hypercalls go through the complete_hypercall function pointer argument, no matter if the hypercall exits to userspace or not. Previously, the code assumed that KVM_HC_MAP_GPA_RANGE specifically went to userspace, and all the others did not; the new code need not special case KVM_HC_MAP_GPA_RANGE and in fact does not care at all whether there was an exit to userspace or not.
2025-01-20	Merge tag 'kvm-riscv-6.14-1' of https://github.com/kvm-riscv/linux into HEAD	Paolo Bonzini
	KVM/riscv changes for 6.14 - Svvptc, Zabha, and Ziccrse extension support for Guest/VM - Virtualize SBI system suspend extension for Guest/VM - Trap related exit statstics as SBI PMU firmware counters for Guest/VM