summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-01-20Merge branch 'free-htab-element-out-of-bucket-lock'Alexei Starovoitov
Hou Tao says: ==================== The patch set continues the previous work [1] to move all the freeings of htab elements out of bucket lock. One motivation for the patch set is the locking problem reported by Sebastian [2]: the freeing of bpf_timer under PREEMPT_RT may acquire a spin-lock (namely softirq_expiry_lock). However the freeing procedure for htab element has already held a raw-spin-lock (namely bucket lock), and it will trigger the warning: "BUG: scheduling while atomic" as demonstrated by the selftests patch. Another motivation is to reduce the locked scope of bucket lock. However, the patch set doesn't move all freeing of htab element out of bucket lock, it still keep the free of special fields in pre-allocated hash map under the protect of bucket lock in htab_map_update_elem(). The patch set is structured as follows: * Patch #1 moves the element freeing out of bucket lock for htab_lru_map_delete_node(). However the freeing is still in the locked scope of LRU raw spin lock. * Patch #2~#3 move the element freeing out of bucket lock for __htab_map_lookup_and_delete_elem() * Patch #4 cancels the bpf_timer in two steps to fix the locking problem in htab_map_update_elem() for PREEMPT_PRT. * Patch #5 adds a selftest for the locking problem Please see individual patches for more details. Comments are always welcome. --- v3: * patch #1: update the commit message to state that the freeing of special field is still in the locked scope of LRU raw spin lock * patch #4: cancel the bpf_timer in two steps only for PREEMPT_RT (suggested by Alexei) v2: https://lore.kernel.org/bpf/20250109061901.2620825-1-houtao@huaweicloud.com * cancels the bpf timer in two steps instead of breaking the reuse the refill of per-cpu ->extra_elems into two steps v1: https://lore.kernel.org/bpf/20250107085559.3081563-1-houtao@huaweicloud.com [1]: https://lore.kernel.org/bpf/20241106063542.357743-1-houtao@huaweicloud.com [2]: https://lore.kernel.org/bpf/20241106084527.4gPrMnHt@linutronix.de ==================== Link: https://patch.msgid.link/20250117101816.2101857-1-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-01-20selftests/bpf: Add test case for the freeing of bpf_timerHou Tao
The main purpose of the test is to demonstrate the lock problem for the free of bpf_timer under PREEMPT_RT. When freeing a bpf_timer which is running on other CPU in bpf_timer_cancel_and_free(), hrtimer_cancel() will try to acquire a spin-lock (namely softirq_expiry_lock), however the freeing procedure has already held a raw-spin-lock. The test first creates two threads: one to start timers and the other to free timers. The start-timers thread will start the timer and then wake up the free-timers thread to free these timers when the starts complete. After freeing, the free-timer thread will wake up the start-timer thread to complete the current iteration. A loop of 10 iterations is used. Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20250117101816.2101857-6-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-01-20bpf: Cancel the running bpf_timer through kworker for PREEMPT_RTHou Tao
During the update procedure, when overwrite element in a pre-allocated htab, the freeing of old_element is protected by the bucket lock. The reason why the bucket lock is necessary is that the old_element has already been stashed in htab->extra_elems after alloc_htab_elem() returns. If freeing the old_element after the bucket lock is unlocked, the stashed element may be reused by concurrent update procedure and the freeing of old_element will run concurrently with the reuse of the old_element. However, the invocation of check_and_free_fields() may acquire a spin-lock which violates the lockdep rule because its caller has already held a raw-spin-lock (bucket lock). The following warning will be reported when such race happens: BUG: scheduling while atomic: test_progs/676/0x00000003 3 locks held by test_progs/676: #0: ffffffff864b0240 (rcu_read_lock_trace){....}-{0:0}, at: bpf_prog_test_run_syscall+0x2c0/0x830 #1: ffff88810e961188 (&htab->lockdep_key){....}-{2:2}, at: htab_map_update_elem+0x306/0x1500 #2: ffff8881f4eac1b8 (&base->softirq_expiry_lock){....}-{2:2}, at: hrtimer_cancel_wait_running+0xe9/0x1b0 Modules linked in: bpf_testmod(O) Preemption disabled at: [<ffffffff817837a3>] htab_map_update_elem+0x293/0x1500 CPU: 0 UID: 0 PID: 676 Comm: test_progs Tainted: G ... 6.12.0+ #11 Tainted: [W]=WARN, [O]=OOT_MODULE Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)... Call Trace: <TASK> dump_stack_lvl+0x57/0x70 dump_stack+0x10/0x20 __schedule_bug+0x120/0x170 __schedule+0x300c/0x4800 schedule_rtlock+0x37/0x60 rtlock_slowlock_locked+0x6d9/0x54c0 rt_spin_lock+0x168/0x230 hrtimer_cancel_wait_running+0xe9/0x1b0 hrtimer_cancel+0x24/0x30 bpf_timer_delete_work+0x1d/0x40 bpf_timer_cancel_and_free+0x5e/0x80 bpf_obj_free_fields+0x262/0x4a0 check_and_free_fields+0x1d0/0x280 htab_map_update_elem+0x7fc/0x1500 bpf_prog_9f90bc20768e0cb9_overwrite_cb+0x3f/0x43 bpf_prog_ea601c4649694dbd_overwrite_timer+0x5d/0x7e bpf_prog_test_run_syscall+0x322/0x830 __sys_bpf+0x135d/0x3ca0 __x64_sys_bpf+0x75/0xb0 x64_sys_call+0x1b5/0xa10 do_syscall_64+0x3b/0xc0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 ... </TASK> It seems feasible to break the reuse and refill of per-cpu extra_elems into two independent parts: reuse the per-cpu extra_elems with bucket lock being held and refill the old_element as per-cpu extra_elems after the bucket lock is unlocked. However, it will make the concurrent overwrite procedures on the same CPU return unexpected -E2BIG error when the map is full. Therefore, the patch fixes the lock problem by breaking the cancelling of bpf_timer into two steps for PREEMPT_RT: 1) use hrtimer_try_to_cancel() and check its return value 2) if the timer is running, use hrtimer_cancel() through a kworker to cancel it again Considering that the current implementation of hrtimer_cancel() will try to acquire a being held softirq_expiry_lock when the current timer is running, these steps above are reasonable. However, it also has downside. When the timer is running, the cancelling of the timer is delayed when releasing the last map uref. The delay is also fixable (e.g., break the cancelling of bpf timer into two parts: one part in locked scope, another one in unlocked scope), it can be revised later if necessary. It is a bit hard to decide the right fix tag. One reason is that the problem depends on PREEMPT_RT which is enabled in v6.12. Considering the softirq_expiry_lock lock exists since v5.4 and bpf_timer is introduced in v5.15, the bpf_timer commit is used in the fixes tag and an extra depends-on tag is added to state the dependency on PREEMPT_RT. Fixes: b00628b1c7d5 ("bpf: Introduce bpf timers.") Depends-on: v6.12+ with PREEMPT_RT enabled Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Closes: https://lore.kernel.org/bpf/20241106084527.4gPrMnHt@linutronix.de Signed-off-by: Hou Tao <houtao1@huawei.com> Reviewed-by: Toke Høiland-Jørgensen <toke@kernel.org> Link: https://lore.kernel.org/r/20250117101816.2101857-5-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-01-20bpf: Free element after unlock in __htab_map_lookup_and_delete_elem()Hou Tao
The freeing of special fields in map value may acquire a spin-lock (e.g., the freeing of bpf_timer), however, the lookup_and_delete_elem procedure has already held a raw-spin-lock, which violates the lockdep rule. The running context of __htab_map_lookup_and_delete_elem() has already disabled the migration. Therefore, it is OK to invoke free_htab_elem() after unlocking the bucket lock. Fix the potential problem by freeing element after unlocking bucket lock in __htab_map_lookup_and_delete_elem(). Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20250117101816.2101857-4-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-01-20bpf: Bail out early in __htab_map_lookup_and_delete_elem()Hou Tao
Use goto statement to bail out early when the target element is not found, instead of using a large else branch to handle the more likely case. This change doesn't affect functionality and simply make the code cleaner. Signed-off-by: Hou Tao <houtao1@huawei.com> Reviewed-by: Toke Høiland-Jørgensen <toke@kernel.org> Link: https://lore.kernel.org/r/20250117101816.2101857-3-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-01-20bpf: Free special fields after unlock in htab_lru_map_delete_node()Hou Tao
When bpf_timer is used in LRU hash map, calling check_and_free_fields() in htab_lru_map_delete_node() will invoke bpf_timer_cancel_and_free() to free the bpf_timer. If the timer is running on other CPUs, hrtimer_cancel() will invoke hrtimer_cancel_wait_running() to spin on current CPU to wait for the completion of the hrtimer callback. Considering that the deletion has already acquired a raw-spin-lock (bucket lock). To reduce the time holding the bucket lock, move the invocation of check_and_free_fields() out of bucket lock. However, because htab_lru_map_delete_node() is invoked with LRU raw spin lock being held, the freeing of special fields still happens in a locked scope. Signed-off-by: Hou Tao <houtao1@huawei.com> Reviewed-by: Toke Høiland-Jørgensen <toke@kernel.org> Link: https://lore.kernel.org/r/20250117101816.2101857-2-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-01-20Merge branches 'acpi-battery', 'acpi-fan' and 'acpi-misc'Rafael J. Wysocki
Merge ACPI battery and fan drivers updates and miscellaneous ACPI chanages for 6.14: - Update messages printed by the ACPI battery driver to always refer to driver extensions as "hooks" to avoid confusion with similar functionality in the power supply subsystem in the future (Thomas Weißschuh). - Fix .probe() error path cleanup in the ACPI fan driver to avoid memory leaks (Joe Hattori). - Constify 'struct bin_attribute' in some places in the ACPI subsystem and mark it as __ro_after_init in one place to prevent binary blob attributes from being updated (Thomas Weißschuh) - Add empty stubs for several ACPI-related symbols so that they can be used when CONFIG_ACPI is unset and use them for removing unnecessary conditional compilation from the ipu-bridge driver (Ricardo Ribalda). * acpi-battery: ACPI: battery: Rename extensions to hook in messages * acpi-fan: ACPI: fan: cleanup resources in the error path of .probe() * acpi-misc: media: ipu-bridge: Remove unneeded conditional compilations ACPI: bus: implement acpi_device_hid when !ACPI ACPI: bus: implement for_each_acpi_consumer_dev when !ACPI ACPI: header: implement acpi_device_handle when !ACPI ACPI: bus: implement acpi_get_physical_device_location when !ACPI ACPI: bus: implement for_each_acpi_dev_match when !ACPI ACPI: bus: change the prototype for acpi_get_physical_device_location ACPI: sysfs: Constify 'struct bin_attribute' ACPI: BGRT: Constify 'struct bin_attribute' ACPI: BGRT: Mark bin_attribute as __ro_after_init
2025-01-20x86: use cmov for user address maskingLinus Torvalds
This was a suggestion by David Laight, and while I was slightly worried that some micro-architecture would predict cmov like a conditional branch, there is little reason to actually believe any core would be that broken. Intel documents that their existing cores treat CMOVcc as a data dependency that will constrain speculation in their "Speculative Execution Side Channel Mitigations" whitepaper: "Other instructions such as CMOVcc, AND, ADC, SBB and SETcc can also be used to prevent bounds check bypass by constraining speculative execution on current family 6 processors (Intel® Core™, Intel® Atom™, Intel® Xeon® and Intel® Xeon Phi™ processors)" and while that leaves the future uarch issues open, that's certainly true of our traditional SBB usage too. Any core that predicts CMOV will be unusable for various crypto algorithms that need data-independent timing stability, so let's just treat CMOV as the safe choice that simplifies the address masking by avoiding an extra instruction and doesn't need a temporary register. Suggested-by: David Laight <David.Laight@aculab.com> Link: https://www.intel.com/content/dam/develop/external/us/en/documents/336996-speculative-execution-side-channel-mitigations.pdf Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-01-20Merge branches 'acpi-osl', 'acpi-tables', 'acpi-property', 'acpi-prm' and ↵Rafael J. Wysocki
'acpi-apei' Merge assorted changes in ACPI library code for 6.14: - Use usleep_range() instead of msleep() in acpi_os_sleep() to reduce excessive delays due to timer inaccuracy, mostly affecting system suspend and resume (Rafael Wysocki). - Use str_enabled_disabled() string helpers in the ACPI tables parsing code to make it easier to follow (Sunil V L). - Update device properties parsing on systems using ACPI so that data firmware nodes resulting from _DSD evaluation are treated as available in firmware nodes walks (Sakari Ailus). - Fix missing guid_t declaration in linux/prmt.h (Robert Richter). - Update the GHES handling code to follow the global panic= instead of overriding it by force-rebooting the system after a fatal hw error has been reported (Borislav Petkov). * acpi-osl: ACPI: OSL: Use usleep_range() in acpi_os_sleep() * acpi-tables: ACPI: tables: Use string choice helpers * acpi-property: ACPI: property: Consider data nodes as being available * acpi-prm: ACPI: PRM: Fix missing guid_t declaration in linux/prmt.h * acpi-apei: APEI: GHES: Have GHES honor the panic= setting
2025-01-20x86: use proper 'clac' and 'stac' opcode namesLinus Torvalds
Back when we added SMAP support, all versions of binutils didn't necessarily understand the 'clac' and 'stac' instructions. So we implemented those instructions manually as ".byte" sequences. But we've since upgraded the minimum version of binutils to version 2.25, and that included proper support for the SMAP instructions, and there's no reason for us to use some line noise to express them any more. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-01-20iommufd/fault: Use a separate spinlock to protect fault->deliver listNicolin Chen
The fault->mutex serializes the fault read()/write() fops and the iommufd_fault_auto_response_faults(), mainly for fault->response. Also, it was conveniently used to fence the fault->deliver in poll() fop and iommufd_fault_iopf_handler(). However, copy_from/to_user() may sleep if pagefaults are enabled. Thus, they could take a long time to wait for user pages to swap in, blocking iommufd_fault_iopf_handler() and its caller that is typically a shared IRQ handler of an IOMMU driver, resulting in a potential global DOS. Instead of reusing the mutex to protect the fault->deliver list, add a separate spinlock, nested under the mutex, to do the job. iommufd_fault_iopf_handler() would no longer be blocked by copy_from/to_user(). Add a free_list in iommufd_auto_response_faults(), so the spinlock can simply fence a fast list_for_each_entry_safe routine. Provide two deliver list helpers for iommufd_fault_fops_read() to use: - Fetch the first iopf_group out of the fault->deliver list - Restore an iopf_group back to the head of the fault->deliver list Lastly, move the mutex closer to the response in the fault structure, and update its kdoc accordingly. Fixes: 07838f7fd529 ("iommufd: Add iommufd fault object") Link: https://patch.msgid.link/r/20250117192901.79491-1-nicolinc@nvidia.com Cc: stable@vger.kernel.org Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-01-20riscv: export __cpuid_to_hartid_mapValentina Fernandez
EXPORT_SYMBOL_GPL() is missing for __cpuid_to_hartid_map array. Export this symbol to allow drivers compiled as modules to use cpuid_to_hartid_map(). Signed-off-by: Valentina Fernandez <valentina.fernandezalanis@microchip.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
2025-01-20riscv: sbi: vendorid_list: Add Microchip Technology to the vendor listValentina Fernandez
Add Microchip Technology to the RISC-V vendor list. Signed-off-by: Valentina Fernandez <valentina.fernandezalanis@microchip.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
2025-01-20cpuidle: teo: Skip sleep length computation for low latency constraintsRafael J. Wysocki
If the idle state exit latency constraint is sufficiently low, it is better to avoid the additional latency related to calling tick_nohz_get_sleep_length(). It is also not necessary to compute the sleep length in that case because shallow idle state selection will be forced then regardless of the recent wakeup history. Accordingly, skip the sleep length computation and subsequent checks of the exit latency constraint is low enough. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Christian Loehle <christian.loehle@arm.com> Link: https://patch.msgid.link/6122398.lOV4Wx5bFT@rjwysocki.net
2025-01-20cpuidle: teo: Replace time_span_ns with a flagRafael J. Wysocki
After recent updates, the time_span_ns field in struct teo_cpu has become an indicator on whether or not the most recent wakeup has been "genuine" which may as well be indicated by a bool field without calling local_clock(), so update the code accordingly. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Christian Loehle <christian.loehle@arm.com> Tested-by: Aboorva Devarajan <aboorvad@linux.ibm.com> Tested-by: Christian Loehle <christian.loehle@arm.com> Link: https://patch.msgid.link/6010475.MhkbZ0Pkbq@rjwysocki.net
2025-01-20cpuidle: teo: Simplify handling of total events countRafael J. Wysocki
Instead of computing the total events count from scratch every time, decay it and add a PULSE value to it in teo_update(). No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Christian Loehle <christian.loehle@arm.com> Tested-by: Aboorva Devarajan <aboorvad@linux.ibm.com> Tested-by: Christian Loehle <christian.loehle@arm.com> Link: https://patch.msgid.link/9388883.CDJkKcVGEf@rjwysocki.net
2025-01-20cpuidle: teo: Skip getting the sleep length if wakeups are very frequentRafael J. Wysocki
Commit 6da8f9ba5a87 ("cpuidle: teo: Skip tick_nohz_get_sleep_length() call in some cases") attempted to reduce the governor overhead in some cases by making it avoid obtaining the sleep length (the time till the next timer event) which may be costly. Among other things, after the above commit, tick_nohz_get_sleep_length() was not called any more when idle state 0 was to be returned, which turned out to be problematic and the previous behavior in that respect was restored by commit 4b20b07ce72f ("cpuidle: teo: Don't count non- existent intercepts"). However, commit 6da8f9ba5a87 also caused the governor to avoid calling tick_nohz_get_sleep_length() on systems where idle state 0 is a "polling" one (that is, it is not really an idle state, but a loop continuously executed by the CPU) when the target residency of the idle state to be returned was low enough, so there was no practical need to refine the idle state selection in any way. This change was not removed by the other commit, so now on systems where idle state 0 is a "polling" one, tick_nohz_get_sleep_length() is called when idle state 0 is to be returned, but it is not called when a deeper idle state with sufficiently low target residency is to be returned. That is arguably confusing and inconsistent. Moreover, there is no specific reason why the behavior in question should depend on whether or not idle state 0 is a "polling" one. One way to address this would be to make the governor always call tick_nohz_get_sleep_length() to obtain the sleep length, but that would effectively mean reverting commit 6da8f9ba5a87 and restoring the latency issue that was the reason for doing it. This approach is thus not particularly attractive. To address it differently, notice that if a CPU is woken up very often, this is not likely to be caused by timers in the first place (user space has a default timer slack of 50 us and there are relatively few timers with a deadline shorter than several microseconds in the kernel) and even if it were the case, the potential benefit from using a deep idle state would then be questionable for latency reasons. Therefore, if the majority of CPU wakeups occur within several microseconds, it can be assumed that all wakeups in that range are non-timer and the sleep length need not be determined. Accordingly, introduce a new metric for counting wakeups with the measured idle duration below RESIDENCY_THRESHOLD_NS and modify the idle state selection to skip the tick_nohz_get_sleep_length() invocation if idle state 0 has been selected or the target residency of the candidate idle state is below RESIDENCY_THRESHOLD_NS and the value of the new metric is at least 1/2 of the total event count. Since the above requires the measured idle duration to be determined every time, except for the cases when one of the safety nets has triggered in which the wakeup is counted as a hit in the deepest idle state idle residency range, update the handling of those cases to avoid skipping the idle duration computation when the CPU wakeup is "genuine". Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/3851791.kQq0lBPeGt@rjwysocki.net Tested-by: Aboorva Devarajan <aboorvad@linux.ibm.com> Tested-by: Christian Loehle <christian.loehle@arm.com> Reviewed-by: Christian Loehle <christian.loehle@arm.com> [ rjw: Renamed a struct field ] [ rjw: Fixed typo in the subject and one in a comment ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-01-20cpuidle: teo: Simplify counting events used for tick managementRafael J. Wysocki
Replace the tick_hits metric with a new tick_intercepts one that can be used directly when deciding whether or not to stop the scheduler tick and update the governor functional description accordingly. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Christian Loehle <christian.loehle@arm.com> Tested-by: Aboorva Devarajan <aboorvad@linux.ibm.com> Tested-by: Christian Loehle <christian.loehle@arm.com> Link: https://patch.msgid.link/1987985.PYKUYFuaPT@rjwysocki.net
2025-01-20cpuidle: teo: Clarify two code commentsRafael J. Wysocki
Rewrite two code comments suposed to explain its behavior that are too concise or not sufficiently clear. No functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Christian Loehle <christian.loehle@arm.com> Tested-by: Aboorva Devarajan <aboorvad@linux.ibm.com> Tested-by: Christian Loehle <christian.loehle@arm.com> Link: https://patch.msgid.link/8472971.T7Z3S40VBb@rjwysocki.net [ rjw: Fixed 2 typos in new comments ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-01-20cpuidle: teo: Drop local variable prev_intercept_idxRafael J. Wysocki
Local variable prev_intercept_idx in teo_select() is redundant because it cannot be 0 when candidate state index is 0. The prev_intercept_idx value is the index of the deepest enabled idle state, so if it is 0, state 0 is the deepest enabled idle state, in which case it must be the only enabled idle state, but then teo_select() would have returned early before initializing prev_intercept_idx. Thus prev_intercept_idx must be nonzero and the check of it against 0 always passes, so it can be dropped altogether. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Christian Loehle <christian.loehle@arm.com> Tested-by: Aboorva Devarajan <aboorvad@linux.ibm.com> Tested-by: Christian Loehle <christian.loehle@arm.com> Link: https://patch.msgid.link/3327997.aeNJFYEL58@rjwysocki.net [ rjw: Fixed typo in the changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-01-20cpuidle: teo: Combine candidate state index checks against 0Rafael J. Wysocki
There are two candidate state index checks against 0 in teo_select() that need not be separate, so combine them and update comments around them. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Christian Loehle <christian.loehle@arm.com> Tested-by: Aboorva Devarajan <aboorvad@linux.ibm.com> Tested-by: Christian Loehle <christian.loehle@arm.com> Link: https://patch.msgid.link/13676346.uLZWGnKmhe@rjwysocki.net
2025-01-20cpuidle: teo: Reorder candidate state index checksRafael J. Wysocki
Since constraint_idx may be 0, the candidate state index may change to 0 after assigning constraint_idx to it, so first check if it is greater than constraint_idx (and update it if so) and then check it against 0. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Christian Loehle <christian.loehle@arm.com> Tested-by: Aboorva Devarajan <aboorvad@linux.ibm.com> Tested-by: Christian Loehle <christian.loehle@arm.com> Link: https://patch.msgid.link/1907276.tdWV9SEqCh@rjwysocki.net
2025-01-20cpuidle: teo: Rearrange idle state lookup codeRafael J. Wysocki
Rearrange code in the idle state lookup loop in teo_select() to make it somewhat easier to follow and update comments around it. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Christian Loehle <christian.loehle@arm.com> Tested-by: Aboorva Devarajan <aboorvad@linux.ibm.com> Tested-by: Christian Loehle <christian.loehle@arm.com> Link: https://patch.msgid.link/4619938.LvFx2qVVIh@rjwysocki.net
2025-01-20Merge tag 'asoc-v6.14' of ↵Takashi Iwai
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus ASoC: Updates for v6.14 This was quite a quiet release for what I imagine are holiday related reasons, the diffstat is dominated by some Cirrus Logic Kunit tests. There's the usual mix of small improvements and fixes, plus a few new drivers and features. The diffstat includes some DRM changes due to work on HDMI audio. - Allow clocking on each DAI in an audio graph card to be configured separately. - Improved power management for Renesas RZ-SSI. - KUnit testing for the Cirrus DSP framework. - Memory to meory operation support for Freescale/NXP platforms. - Support for pause operations in SOF. - Support for Allwinner suinv F1C100s, Awinc AW88083, Realtek ALC5682I-VE
2025-01-20ASoC: codecs: ES8326: Improved PSRRZhang Yi
Modified configuration to improve PSSR when ES8326 is working Signed-off-by: Zhang Yi <zhangyi@everest-semi.com> Link: https://patch.msgid.link/20250120101758.13347-1-zhangyi@everest-semi.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-01-20ASoC: fsl_asrc_m2m: return error value in asrc_m2m_device_run()Shengjiu Wang
The asrc_m2m_device_run() function is the main process function of converting, the error need to be returned to user, that user can handle error case properly. Fixes: 24a01710f627 ("ASoC: fsl_asrc_m2m: Add memory to memory function") Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com> Reviewed-by: Daniel Baluta <daniel.baluta@nxp.com> Link: https://patch.msgid.link/20250120081938.2501554-3-shengjiu.wang@nxp.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-01-20ASoC: fsl_asrc_m2m: only handle pairs for m2m in the suspendShengjiu Wang
ASRC memory to memory cases and memory to peripheral cases are sharing the same pair pools, the pairs got for m2m suspend function may be used for memory to peripheral, which is handled memory to peripheral driver and can't be handled in memory to memory suspend function. Use the "pair->dma_buffer" as a flag for memory to memory case, when it is allocated, handle the suspend operation for the related pairs. Fixes: 24a01710f627 ("ASoC: fsl_asrc_m2m: Add memory to memory function") Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com> Reviewed-by: Daniel Baluta <daniel.baluta@nxp.com> Link: https://patch.msgid.link/20250120081938.2501554-2-shengjiu.wang@nxp.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-01-20ALSA: hda: tas2781-spi: Fix -Wsometimes-uninitialized in ↵Nathan Chancellor
tasdevice_spi_switch_book() Clang warns (or errors with CONFIG_WERROR=y): sound/pci/hda/tas2781_hda_spi.c:110:6: error: variable 'ret' is used uninitialized whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized] 110 | if (tas_priv->cur_book != TASDEVICE_BOOK_ID(reg)) { | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ sound/pci/hda/tas2781_hda_spi.c:119:9: note: uninitialized use occurs here 119 | return ret; | ^~~ sound/pci/hda/tas2781_hda_spi.c:110:2: note: remove the 'if' if its condition is always true 110 | if (tas_priv->cur_book != TASDEVICE_BOOK_ID(reg)) { | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ sound/pci/hda/tas2781_hda_spi.c:108:9: note: initialize the variable 'ret' to silence this warning 108 | int ret; | ^ | = 0 Sink the declaration of ret into the if block and just return 0 at the end of the function, as there is nothing to do if cur_book has already been changed. Fixes: bb5f86ea50ff ("ALSA: hda/tas2781: Add tas2781 hda SPI driver") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202501192006.Hm9GmKiV-lkp@intel.com/ Signed-off-by: Nathan Chancellor <nathan@kernel.org> Link: https://patch.msgid.link/20250120-tas2781_hda_spi-fix-wsometimes-uninitialized-v1-1-d7fd104aa63e@kernel.org Signed-off-by: Takashi Iwai <tiwai@suse.de>
2025-01-20Merge branch 'for-6.14/selftests-trivial' into for-linusPetr Mladek
2025-01-20Merge branch 'for-6.14-cpu_sync-fixup' into for-linusPetr Mladek
2025-01-20net: phy: realtek: HWMON support for standalone versions of RTL8221B and RTL8251Aleksander Jan Bajkowski
HWMON support has been added for the RTL8221/8251 PHYs integrated together with the MAC inside the RTL8125/8126 chips. This patch extends temperature reading support for standalone variants of the mentioned PHYs. I don't know whether the earlier revisions of the RTL8226 also have a built-in temperature sensor, so they have been skipped for now. Tested on RTL8221B-VB-CG. Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-01-20tcp_cubic: fix incorrect HyStart round start detectionMahdi Arghavani
I noticed that HyStart incorrectly marks the start of rounds, leading to inaccurate measurements of ACK train lengths and resetting the `ca->sample_cnt` variable. This inaccuracy can impact HyStart's functionality in terminating exponential cwnd growth during Slow-Start, potentially degrading TCP performance. The issue arises because the changes introduced in commit 4e1fddc98d25 ("tcp_cubic: fix spurious Hystart ACK train detections for not-cwnd-limited flows") moved the caller of the `bictcp_hystart_reset` function inside the `hystart_update` function. This modification added an additional condition for triggering the caller, requiring that (tcp_snd_cwnd(tp) >= hystart_low_window) must also be satisfied before invoking `bictcp_hystart_reset`. This fix ensures that `bictcp_hystart_reset` is correctly called at the start of a new round, regardless of the congestion window size. This is achieved by moving the condition (tcp_snd_cwnd(tp) >= hystart_low_window) from before calling `bictcp_hystart_reset` to after it. I tested with a client and a server connected through two Linux software routers. In this setup, the minimum RTT was 150 ms, the bottleneck bandwidth was 50 Mbps, and the bottleneck buffer size was 1 BDP, calculated as (50M / 1514 / 8) * 0.150 = 619 packets. I conducted the test twice, transferring data from the server to the client for 1.5 seconds. Before the patch was applied, HYSTART-DELAY stopped the exponential growth of cwnd when cwnd = 516, and the bottleneck link was not yet saturated (516 < 619). After the patch was applied, HYSTART-ACK-TRAIN stopped the exponential growth of cwnd when cwnd = 632, and the bottleneck link was saturated (632 > 619). In this test, applying the patch resulted in 300 KB more data delivered. Fixes: 4e1fddc98d25 ("tcp_cubic: fix spurious Hystart ACK train detections for not-cwnd-limited flows") Signed-off-by: Mahdi Arghavani <ma.arghavani@yahoo.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Haibo Zhang <haibo.zhang@otago.ac.nz> Cc: David Eyers <david.eyers@otago.ac.nz> Cc: Abbas Arghavani <abbas.arghavani@mdu.se> Reviewed-by: Neal Cardwell <ncardwell@google.com> Tested-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-01-20net: macsec: Add endianness annotations in salt structAles Nezbeda
This change resolves warning produced by sparse tool as currently there is a mismatch between normal generic type in salt and endian annotated type in macsec driver code. Endian annotated types should be used here. Sparse output: warning: restricted ssci_t degrades to integer warning: incorrect type in assignment (different base types) expected restricted ssci_t [usertype] ssci got unsigned int warning: restricted __be64 degrades to integer warning: incorrect type in assignment (different base types) expected restricted __be64 [usertype] pn got unsigned long long Signed-off-by: Ales Nezbeda <anezbeda@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-01-20tipc: re-order conditions in tipc_crypto_key_rcv()Dan Carpenter
On a 32bit system the "keylen + sizeof(struct tipc_aead_key)" math could have an integer wrapping issue. It doesn't matter because the "keylen" is checked on the next line, but just to make life easier for static analysis tools, let's re-order these conditions and avoid the integer overflow. Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-01-20net: phylink: always do a major config when attaching a SFP PHYRussell King (Oracle)
Background: https://lore.kernel.org/r/20250107123615.161095-1-ericwouds@gmail.com Since adding negotiation of in-band capabilities, it is no longer sufficient to just look at the MLO_AN_xxx mode and PHY interface to decide whether to do a major configuration, since the result now depends on the capabilities of the attaching PHY. Always trigger a major configuration in this case. Testing log: https://lore.kernel.org/r/f20c9744-3953-40e7-a9c9-5534b25d2e2a@gmail.com Reported-by: Eric Woudstra <ericwouds@gmail.com> Tested-by: Eric Woudstra <ericwouds@gmail.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-01-20platform/x86: acer-wmi: Fix initialization of last_non_turbo_profileArmin Wolf
On machines that do not support the balanced profile the value of last_non_turbo_profile is invalid after initialization which might cause the driver to switch to an unsupported platform profile later. Fix this by only setting last_non_turbo_profile to supported platform profile values. Fixes: 191e21f1a4c3 ("platform/x86: acer-wmi: use an ACPI bitmap to set the platform profile choices") Signed-off-by: Armin Wolf <W_Armin@gmx.de> Tested-by: Hridesh MG <hridesh699@gmail.com> Reviewed-by: Hridesh MG <hridesh699@gmail.com> Link: https://lore.kernel.org/r/20250119201723.11102-3-W_Armin@gmx.de Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-01-20platform/x86: acer-wmi: Ignore AC eventsArmin Wolf
On the Acer Swift SFG14-41, the events 8 - 1 and 8 - 0 are printed on AC connect/disconnect. Ignore those events to avoid spamming the kernel log with error messages. Reported-by: Farhan Anwar <farhan.anwar8@gmail.com> Closes: https://lore.kernel.org/platform-driver-x86/2ffb529d-e7c8-4026-a3b8-120c8e7afec8@gmail.com Tested-by: Rayan Margham <rayanmargham4@gmail.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Armin Wolf <W_Armin@gmx.de> Link: https://lore.kernel.org/r/20250119201723.11102-2-W_Armin@gmx.de Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-01-20Merge branch 'kvm-mirror-page-tables' into HEADPaolo Bonzini
As part of enabling TDX virtual machines, support support separation of private/shared EPT into separate roots. Confidential computing solutions almost invariably have concepts of private and shared memory, but they may different a lot in the details. In SEV, for example, the bit is handled more like a permission bit as far as the page tables are concerned: the private/shared bit is not included in the physical address. For TDX, instead, the bit is more like a physical address bit, with the host mapping private memory in one half of the address space and shared in another. Furthermore, the two halves are mapped by different EPT roots and only the shared half is managed by KVM; the private half (also called Secure EPT in Intel documentation) gets managed by the privileged TDX Module via SEAMCALLs. As a result, the operations that actually change the private half of the EPT are limited and relatively slow compared to reading a PTE. For this reason the design for KVM is to keep a mirror of the private EPT in host memory. This allows KVM to quickly walk the EPT and only perform the slower private EPT operations when it needs to actually modify mid-level private PTEs. There are thus three sets of EPT page tables: external, mirror and direct. In the case of TDX (the only user of this framework) the first two cover private memory, whereas the third manages shared memory: external EPT - Hidden within the TDX module, modified via TDX module calls. mirror EPT - Bookkeeping tree used as an optimization by KVM, not used by the processor. direct EPT - Normal EPT that maps unencrypted shared memory. Managed like the EPT of a normal VM. Modifying external EPT ---------------------- Modifications to the mirrored page tables need to also perform the same operations to the private page tables, which will be handled via kvm_x86_ops. Although this prep series does not interact with the TDX module at all to actually configure the private EPT, it does lay the ground work for doing this. In some ways updating the private EPT is as simple as plumbing PTE modifications through to also call into the TDX module; however, the locking is more complicated because inserting a single PTE cannot anymore be done atomically with a single CMPXCHG. For this reason, the existing FROZEN_SPTE mechanism is used whenever a call to the TDX module updates the private EPT. FROZEN_SPTE acts basically as a spinlock on a PTE. Besides protecting operation of KVM, it limits the set of cases in which the TDX module will encounter contention on its own PTE locks. Zapping external EPT -------------------- While the framework tries to be relatively generic, and to be understandable without knowing TDX much in detail, some requirements of TDX sometimes leak; for example the private page tables also cannot be zapped while the range has anything mapped, so the mirrored/private page tables need to be protected from KVM operations that zap any non-leaf PTEs, for example kvm_mmu_reset_context() or kvm_mmu_zap_all_fast(). For normal VMs, guest memory is zapped for several reasons: user memory getting paged out by the guest, memslots getting deleted, passthrough of devices with non-coherent DMA. Confidential computing adds to these the conversion of memory between shared and privates. These operations must not zap any private memory that is in use by the guest. This is possible because the only zapping that is out of the control of KVM/userspace is paging out userspace memory, which cannot apply to guestmemfd operations. Thus a TDX VM will only zap private memory from memslot deletion and from conversion between private and shared memory which is triggered by the guest. To avoid zapping too much memory, enums are introduced so that operations can choose to target only private or shared memory, and thus only direct or mirror EPT. For example: Memslot deletion - Private and shared MMU notifier based zapping - Shared only Conversion to shared - Private only Conversion to private - Shared only Other cases of zapping will not be supported for KVM, for example APICv update or non-coherent DMA status update; for the latter, TDX will simply require that the CPU supports self-snoop and honor guest PAT unconditionally for shared memory.
2025-01-20samples/vfs: fix build warningsChristian Brauner
Fix build warnings reported from linux-next. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Link: https://lore.kernel.org/r/20250120192504.4a1965a0@canb.auug.org.au Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-01-20platform/mellanox: mlxreg-io: use sysfs_emit() instead of sprintf()Ai Chao
Follow the advice in Documentation/filesystems/sysfs.rst: show() should only use sysfs_emit() or sysfs_emit_at() when formatting the value to be returned to user space. Signed-off-by: Ai Chao <aichao@kylinos.cn> Acked-by: Vadim Pasternak <vadimp@nvidia.com> Link: https://lore.kernel.org/r/20250116081129.2902274-1-aichao@kylinos.cn Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-01-20platform/mellanox: mlxreg-hotplug: use sysfs_emit() instead of sprintf()Ai Chao
Follow the advice in Documentation/filesystems/sysfs.rst: show() should only use sysfs_emit() or sysfs_emit_at() when formatting the value to be returned to user space. Signed-off-by: Ai Chao <aichao@kylinos.cn> Acked-by: Vadim Pasternak <vadimp@nvidia.com> Link: https://lore.kernel.org/r/20250116081000.2900435-1-aichao@kylinos.cn Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-01-20platform/mellanox: mlxbf-bootctl: use sysfs_emit() instead of sprintf()Ai Chao
Follow the advice in Documentation/filesystems/sysfs.rst: show() should only use sysfs_emit() or sysfs_emit_at() when formatting the value to be returned to user space. Signed-off-by: Ai Chao <aichao@kylinos.cn> Acked-by: Vadim Pasternak <vadimp@nvidia.com> Link: https://lore.kernel.org/r/20250116080836.2890442-1-aichao@kylinos.cn Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-01-20platform/x86: hp-wmi: Add fan and thermal profile support for Victus 16-s1000Julien ROBIN
The following patch adds support for HP Victus 16-s1000 laptop series, by adding and fixing the following functionalities, which can be accessed through hwmon and platform_profile sysfs: - Functional measured fan speed reading - Ability to enable and disable maximum fan speed - Platform profiles full setting ability for CPU and GPU It sets appropriates CPU and GPU power settings both on AC and battery power sources, for low-power, balanced and performance modes. It has been thoroughly tested on a 16-s1034nf laptop based on a 8C9C DMI board name, and behavior of the driver on previous boards is left untouched thanks to the separated lists of DMI board names. Signed-off-by: Julien ROBIN <julien.robin28@free.fr> Link: https://lore.kernel.org/r/1c00f906-8500-41d5-be80-f9092b6a49f1@free.fr Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-01-20Merge branch 'thermal-intel'Rafael J. Wysocki
Merge updates of Intel thermal drivers for 6.14: - Add support for Panther Lake processors in multiple places (Zhang Rui, Srinivas Pandruvada). - Remove explicit user_space governor selection from Intel thermal drivers (Srinivas Pandruvada). * thermal-intel: thermal: intel: Fix compile issue when CONFIG_NET is not defined thermal: intel: int340x: Panther Lake power floor and workload hint support thermal: intel: int340x: Panther Lake DLVR support thermal: intel: Remove explicit user_space governor selection ACPI: DPTF: Support Panther Lake thermal: intel: int340x: processor: Enable MMIO RAPL for Panther Lake powercap: intel_rapl: Add support for Panther Lake platform
2025-01-20Merge branch 'kvm-userspace-hypercall' into HEADPaolo Bonzini
Make the completion of hypercalls go through the complete_hypercall function pointer argument, no matter if the hypercall exits to userspace or not. Previously, the code assumed that KVM_HC_MAP_GPA_RANGE specifically went to userspace, and all the others did not; the new code need not special case KVM_HC_MAP_GPA_RANGE and in fact does not care at all whether there was an exit to userspace or not.
2025-01-20Merge tag 'kvm-riscv-6.14-1' of https://github.com/kvm-riscv/linux into HEADPaolo Bonzini
KVM/riscv changes for 6.14 - Svvptc, Zabha, and Ziccrse extension support for Guest/VM - Virtualize SBI system suspend extension for Guest/VM - Trap related exit statstics as SBI PMU firmware counters for Guest/VM
2025-01-20Merge tag 'cpufreq-arm-updates-6.14' of ↵Rafael J. Wysocki
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/vireshk/pm Merge ARM cpufreq updates for 6.14 from Viresh Kumar: "- Extended support for more SoCs in apple cpufreq driver (Hector Martin and Nick Chan). - Add new cpufreq driver for Airoha SoCs (Christian Marangi). - Fix using cpufreq-dt as module (Andreas Kemnade). - Minor fixes for Sparc, scmi, and Qcom drivers (Ethan Carter Edwards, Sibi Sankar and Manivannan Sadhasivam)." * tag 'cpufreq-arm-updates-6.14' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/vireshk/pm: cpufreq: airoha: Add EN7581 CPUFreq SMCCC driver cpufreq: sparc: change kzalloc to kcalloc cpufreq: qcom: Implement clk_ops::determine_rate() for qcom_cpufreq* clocks cpufreq: qcom: Fix qcom_cpufreq_hw_recalc_rate() to query LUT if LMh IRQ is not available cpufreq: apple-soc: Add Apple A7-A8X SoC cpufreq support cpufreq: apple-soc: Set fallback transition latency to APPLE_DVFS_TRANSITION_TIMEOUT cpufreq: apple-soc: Increase cluster switch timeout to 400us cpufreq: apple-soc: Use 32-bit read for status register cpufreq: apple-soc: Allow per-SoC configuration of APPLE_DVFS_CMD_PS1 cpufreq: apple-soc: Drop setting the PS2 field on M2+ dt-bindings: cpufreq: apple,cluster-cpufreq: Add A7-A11, T2 compatibles dt-bindings: cpufreq: Document support for Airoha EN7581 CPUFreq cpufreq: fix using cpufreq-dt as module cpufreq: scmi: Register for limit change notifications
2025-01-20Merge tag 'opp-updates-6.14' of ↵Rafael J. Wysocki
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/vireshk/pm Merge OPP (Operating Performance Points) updates for 6.14 from Viresh Kumar: "- Minor cleanups / fixes (Dan Carpenter, Neil Armstrong, and Joe Hattori). - Implement dev_pm_opp_get_bw (Neil Armstrong). - Expose reference counting helpers (Viresh Kumar)." * tag 'opp-updates-6.14' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/vireshk/pm: PM / OPP: Add reference counting helpers for Rust implementation OPP: OF: Fix an OF node leak in _opp_add_static_v2() OPP: fix dev_pm_opp_find_bw_*() when bandwidth table not initialized OPP: add index check to assert to avoid buffer overflow in _read_freq() opp: core: Fix off by one in dev_pm_opp_get_bw() opp: core: implement dev_pm_opp_get_bw
2025-01-20samples/vfs: use shared headerChristian Brauner
Share some infrastructure between sample programs and fix a build failure that was reported. Reported-by: Sasha Levin <sashal@kernel.org> Link: https://lore.kernel.org/r/Z42UkSXx0MS9qZ9w@lappy Link: https://qa-reports.linaro.org/lkft/sashal-linus-next/build/v6.13-rc7-511-g109a8e0fa9d6/testrun/26809210/suite/build/test/gcc-8-allyesconfig/log Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-01-20Merge tag 'kvm-x86-misc-6.14' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM x86 misc changes for 6.14: - Overhaul KVM's CPUID feature infrastructure to track all vCPU capabilities instead of just those where KVM needs to manage state and/or explicitly enable the feature in hardware. Along the way, refactor the code to make it easier to add features, and to make it more self-documenting how KVM is handling each feature. - Rework KVM's handling of VM-Exits during event vectoring; this plugs holes where KVM unintentionally puts the vCPU into infinite loops in some scenarios (e.g. if emulation is triggered by the exit), and brings parity between VMX and SVM. - Add pending request and interrupt injection information to the kvm_exit and kvm_entry tracepoints respectively. - Fix a relatively benign flaw where KVM would end up redoing RDPKRU when loading guest/host PKRU, due to a refactoring of the kernel helpers that didn't account for KVM's pre-checking of the need to do WRPKRU.